JOURNAL OF 
CONSUITING 
PSYCHOLOGY 


AMERICAN rv fe ASSOCIATION 











JOURNAL OF CONSULTING PSYCHOLOGY 


Eprrep sy Laurance F. Suarrer 
TEACHERS COLLEGE, COLUMBIA UNIVERSITY 


Associate Errors: Eooar A. Dout, Dewereux Schools, Devon, Pa. Wu.tam A. Hunt, Northwestern Unieer- 
sity, E, Loweit. Ketry, University of Michigan, Morris Kaucman, Board of Education, New York, N.Y. 
Beatua M. Luckey, Board of Education, Cleveland, Ohio. Fann McKinney, University of Missouri. Anne 
Ros, New York, N. Y. Cant R. Rocurs, University of Chicago. R. Nevirr SayPonp, University of California. 


Marcaret K. Hartow, Managing F ditor 


Tus Journar or Consurrine Psycnonocy is the clinical journal of the American Psy- 
chological Association. It is devoted primarily to the publication of original research relevant 
to psychological diagnosis, psychotherapy and counseling, personality, and the dynamics of be- 


havior. Case studies, theoretical contributions, descriptions o| 


clinical techniques, and discus- 


sions of the professional problems of clinical psychology also appear in its pages. 


Eprroriat Orrics 


Manuscripts submitted for publication, books and tests { 


r review, and correspondence on 


editorial matters should be sent to the Editor, Laurance F. Shaffer, 525 West 120th St., New 


York 27, N. Y. 


Information for contributers: The important char- 
acteristics of a good article are the significance of its 
subject matter, the demonstrated soundness of its 
conclusions, its clarity, and its brevity. Clarity is 

by careful outlining, discriminating use of 
es = oll ager Bayra gi0 style of 

or ity, authors should strive to convey 
every thought without waste of words, and should 
avoid repetitions of methods, data or findings. 


Some of the mechanical requirements for manu- 
scripts are: Type manuscripts with wide margins, 
on white bond paper. Double space between each and 
every line, including references, footnotes, matter to 
be set in smaller 70 ee peblen, Use only two types 
of headings: centered headings, and underlined par- 
agraph Type each table, however small, 

a — with Arabic number and title. 

w res in India ink in final form for repro- 

with draftsman’s 

on separate sheets, References must 
exact form used in APA journals. For 


v8 


‘ 


further information on the details of preparing man- 
uscripts, see 
Anperson, J. E., ann VaLenting, W. L. The 
preparation of articles for publication in the 
journals of the American Psychological Associa- 
tion. Psychol. Bull., 1944, 14, 345-376. 


There is a lag of about nine to twelve months be- 
tween the acceptance of an article and its publica- 
tion. Authors may secure immediate publication, 
however, by arranging for articies to be printed as 
extra pages, increasing the number of pages that 
subscribers receive. The cost to the author of such 
early publication is approximately $11.00 per print- 
ed page. 


Authors receive fifty reprints gratis, and may or- 
der additiona! quantities when returning proof. Au- 
thors are charged one-hali of the extra cost of tabv- 
lar material and cuts, and the full cost of change; 


made on the proof. 


Busingss Orricr 
Communications concerning subscriptions, change of address, claims for the nonreceipt of a 
number, advertising, and other business matters should be sent to the American Psychological 
Association, Inc., 1515 Massachusetts Ave., N. W., Washington 5, 
The Journal of Consulting Psychology is published bi-monthly in February, April, June, 
August, October, and December, the yearly volume comprising approximately 480 pages. The 
subscription per year is $5.00, foreign $5.50, single number $!.00. 


1951 by the American Ps 


=e ek West Colorado Ave., 


tical Association, Inc. Printed and mailed by the Dentan 
Springs, Colo. and entered as second-class matter at the 


at Toe Colo. . patalling at special rate of postage provided for in 
$38, Act of February 25, 1925, pe as Pv gy 1940. 








VA 
ULTING 


DYOGY 
April, 1951 Vol. 15, No. 2 


CONTENTS 
On Recent Usage of the Einstellung-Effect as a Test of estate 
ABRAHAM S. LUCHINS - - - - 


The Accuracy of Self-Evaluations: Its Measurement and Some of Its Personological 
Correlates: Ropert R. Hott - - - - 


A Preliminary Test of Role-Playing Ability: Witti1am A. McCLeL__tanp 
The Construction and Validation of Two Insight Inventories: Davin GrossMAN 


Differences in Prediction Based on Hearing versus Reading Verbatim 
Clinical Interviews: JosepH Lurr_ - . 


A Forerunner of Rorschach: Mitton S. Gurvitz 

Color and the Validity of the Rorschach 8-9-10 Per Cent: Janet A. PERLMAN 
Norms for “Shock” in the Rorschach: HERBERT SANDERSON 

The Chance Distribution of Szondi Valences: Jacos CoHEN 


The Internal Structure of the MMPI: WILLIAM MarsHALt WHEELER, 
KENNETH B. Littve, anp GeorcE F. J. LEHNER 


An Interpretive Aid for the Sc Scale of the MMPI: Stantey J. Benarick, 
Grorce M. GuTuHriz, AND WILLIAM U. Snyper - = = - 


The Measurement of Intellectual Decline in the Senile Psychoses: 
Jack BotwinickK AND JAMEs E. Birren 


Differences between Neurotics and Schizophrenics on the Wechsler-Bellevue Scale: 
LAWRENCE S. ROGERS - - - - = - = = = = 


Classical and Standard Score IQ Standardization of the I.P.A.T. Culture-Free 
Intelligence Scale2: RayMOND B. CATTELL - - - 


Personality Inventory Data Related to ACE Subscores: Caro: L. PEMBERTON 


The Validity of the Hewson Ratios: 
Joun I. WHEELER, Jr., AND WALTER L. WILKINS - - - - 


New Booksend Tests - - = 2©+ © © = s © « «= 








ON RECENT USAGE OF THE EINSTELLUNG-EFFECT 
AS A TEST OF RIGIDITY’ 


ABRAHAM 5S. LUCHINS 


MCGILL UNIVERSITY 


IF TEEN years ago I became interested, 
under the guidance of the late Max 
Wertheimer, in studying mechanization 

in behavior. The procedure utilized consisted 
in giving subjects a series of problems all solv- 
able by one method, followed by similar-ap- 
pearing problems solvable by a relatively sim- 
pler and more direct method. We were inter- 
ested in ascertaining whether the subject would 
attempt to deal with these latter problems by 
the previously repeated method, and thus fail 
to utilize the more direct method. Among the 
devices employed was a series of volume-meas- 
uring problems, a series of mazes, a series of 
anagrams, a series of pictures, and a series of 
geometric The volume-measurin 
problems can serve as a representative sample. 
The problems were as follows: 


problems. 


vo 
b 


1. Given: an empty 29-quart jar, an empty 3- 


quart jar; measure 20 quarts of water. 


2. Given: an empty 21-quart jar, an empty 127- 
quart jar, and an empty 3-quart jar; measure 
100 quarts of water. 

3. Given: an empty 14-quart jar, an empty 163- 
quart jar, and an empty 25-quart jar; measure 
99 quarts of water. 

4. Given: an empty 18-quart jar, an empty 43- 
quart jar, and an empty 10-quart jar; measure 
5 quarts of water. 

5. Given: an empty 9-quart jar, an empty 42- 
quart jar, and an empty 6-quart jar; measure 
21 quarts of water. 

6. Given: an empty 20-quart jar, an empty 59- 
quart jar, and an empty 4-quart jar; measure 
31 quarts of water. 

7. Given: an empty 23-quart jar, an empty 49- 


1Condensed from the paper, Mechanization in Men 
and Rats, presented at the Canadian Psychological 
Association, Toronto, Canada, 1950. 


89 


quart jar, and an empty 3-quart jar; measure 


20 quarts of water. 


8. Given: an empty 15-quart jar, an empty 39- 
quart jar, and an empty 3-quart jar; measure 
18 quarts of water. 

9. Given: an empty 28-quart jar, an empty 76- 
quart jar, and an empty 3-quart jar; measure 
25 quarts of water. 

10. Given: an empty 18-quart jar, an empty 48- 
quart jar, and an empty 4-quart jar; measure 
22 quarts of water. 

11. Given: an empty 14-quart jar, an empty 36 


quart jar, and an empty 8-quart jar; measure 


6 quarts of water 


If the jars in the order presented in each 
statement of the problem are labeled A, B, and 
C, respectively, then the method which solves 
problems 2 through 6 can be formulated as B 
21 »X 3 


— A — 2C; for example, 127 
= 127 — 27 100. ‘This method is known as 
the Einstellung or E method, and problems 2 
through 6, which are solvable by this method, 
are known as Einstellung or EF problems. Prob- 
lems 7 through 11 are known as test problems 
since they indicate the effect of the E 
in the previous problems. Problems 7, 8, 10, 
and 11 are solvable not only by the E meth- 
od, but by more direct methods, A CorA 

C, respectively. The ninth problem, however, 
cannot be solved by the Einstellung method, 
but is solvable by the A—C formula. This prob- 
lem has been called an extinction task. If an 
individual developed an Einstellung in the first 
two test problems (problems 7 and 8), will he 
recover from the Einstellung due to the effect 
of problem 9 and thus solve the last two test 
problems (problems 10 and 11), in the more 
direct method ? In addition to serving as a fac- 
tor to bring about recovery from the Ein- 
stellung, this ninth problem serves also as a 


method 








90 ABRAHAM 5S. LUCHINS 


crucial test of the strength of the Einstellung 
effect. Will the individual become so habituat- 
ed to the E method that he will fail to solve this 
problem? Because of the nature of the tasks, 
the solution of the first two test problems is 
taken as an indication of whether or not the in- 
dividual developed an Einstellung, and the 
solution of the last two problems is taken as 
an indication of whether or not he has recov- 
ered from the Einstellung. 

Of 1,093 subjects to whom the above de- 
scribed test was presented, 83 per cent solved 
the first two test problems in the £ manner and 
64 per cent failed to solve problem 9. In con- 
trast, of the 970 subjects to whom only the test 
problems were presented (problems 2 to 6 were 
not given) only 0.6 per cent of the subjects 
used the F method in the first two problems, 
and only 5 per cent failed to solve the extinc- 
tion problem. These results would seem to in- 
dicate that the repeated use of the same meth- 
od of solving problems blinded subjects to the 
more obvious and direct ways of dealing with 
them. These results have been corroborated in 
subsequent experimentation involving over ten 
thousand subjects. Large Einstellung effects 
were obtained in all experimental groups, 
whether they were composed of elementary 
school children, high school children, college 
students, graduate students, adults with no for- 
mal education, or college professors. Although 
the groups did not differ greatly in Einstellung 
effect, they did differ strikingly in the amount 
of recovery from it. The various methods in- 
troduced to produce recovery were usually 
not effective in the elementary school groups 
but were effective in the other groups. 

Intensive investigation aimed at understand- 
ing what brings about this mechanized be- 
havior was undertaken. Experimental vari- 
ations were conducted to bring about extremes 
in results, that is, 100 per cent Ejinstellung ef- 
fect on the one hand and 0 per cent Einstellung 
effect on the other. Not only were we interested 
in finding out which attitudes, assumptions, 
and social atmospheres would bring about these 
extreme results, but what methods would bring 
about 100 per cent recovery in groups which 
had shown less. Details about these variations 
have been reported elsewhere [5]. 

The results of all these experiments seem to 
point to the conclusion that basically various 


phenomena appear to be involved in the overtly 
similar E reactions. For example, some subjects 
repeat the E method automatically; they are 
later, when shown the direct method, surprised 
and shocked to see how easily the test problems 
can be solved; they are puzzled or ashamed 
about how blind they were to the direct meth- 
od. On the other hand, there are subjects who 
use the E method for certain reasons, acting on 
the basis of interpretations and assumptions 
concerning their task; for example, they think 
that the - method is the rule for these prob- 
lems, or that the experimenter wants then to 
use this method, or that all three jars must be 
employed. One cannot take an E solution to a 
problem as an indication of blind, mechanical 
behavior. In order to understand how Einstel- 
lung effects are brought about in a particular 
instance it is important to study the thinking 
processes which led to the results. 

With regard to theoretical explanations of 
Einstellung behavior, my experiments did not 
lead to a clear, positive formulation, but they 
did show what the Einstellung effect may not 
be. The Einstellung behavior cannot be ade- 
quately understood as long as we center on the 
individual qua individual ; that is, as long as we 
assume that it is due to something in the re- 
spondent’s nature. Field conditions seem to in- 
fluence the Einstellung behavior. Thus, making 
a speed test out of the experiment proved very 
effective in producing 100 per cent or almost 
100 per cent F solutions of the test problems, 
even in groups which had previously shown less 
Einstellung effect. Making a speed test out of 
the experiment even vitiated the possible effects 
of factors introduced to prevent Einstellung 
effects or to produce recovery from them. Not 
speed of response in and of itself but the man- 
ner in which the subject reacted to the pres- 
sure of timing brought about the blinding 
effects. Detailed accounts of the apparent ef- 
fects of various social atmospheres on the Ein- 
stellung phenomena are contained in the pre- 
viously mentioned monograph [5]. 

Thus ends the summary of the first stage of 
research on the Einstellung effect. Let us now 
consider the second stage. In everyday life and 
in the clinic there are innumerable instances of 
stereotyped, mechanical behavior. Clinicians, 
and even social psychologists, have become in- 
terested in studying and in measuring this be- 








THE EINSTELLUNG-EFFECT AS A TEST OF RIGIDITY 91 


havior—usually calling such behavior “rigidi- 
ty.” While serving as a clinician in the U.S. 
Army and later for the U.S. Veterans Ad- 
ministration, I attempted to see whether the be- 
havior in the experimental tasks used to study 
Einstellung effects could throw light upon 
what clinicians call “rigidity.” 

It is important to point out that I was in- 
terested in seeing how many factors and what 
kind of factors have to be introduced to bring 
about recovery from the Einstellung. I was 
primarily concerned with observing how the pa- 
tient reacts when confronted with changing 
conditions. It was hoped that eventually the 
tests might have some predictive value in indi- 
cating the treatability of the patient and might 
perhaps serve as a diagnostic indicator of or- 
ganic involvement in mental illness. 

The Einstellung effect has recently been 
utilized by other psychologists to study rigidity. 
I know of ten dissertations being written, at 
present, in which the Einstellung effect is used 
as a measure of rigidity. In one of the most fre- 
quently quoted studies, in which the Einstel- 
lung effect was utilized, that of Rokeach [10], 
rigidity was defined as “the inability to change 
one’s set when the objective conditions demand 
it, as the inability to restructure a field in 
which there are alternative solutions to a prob- 
lem in order to solve that problem more effi- 
ciently.” But, when one examines the test 
which Rokeach finally selected to measure 
rigidity, one finds that no problems similar to 
the ninth one described above, extinction prob- 
lems, were used, but only critical problems. 
In these latter problems the use of the E meth- 
od results in a solution. Moreover, the E meth- 
od might be the most efficient method to 
accomplish the task set by the experimenter, 
especially since the subjects were told that they 
were being tested to see how quickly and effi- 
ciently they could solve simple arithmetical 
problems. 

Rokeach [10], Else Frenkel-Brunswik [1, 
2], and others who have reported upon their 
findings with the Einstellung “test of rigidi- 
ty,” seem to be utilizing the volume-measuring 
problems without taking into account the find- 
ings of experimentation aimed at understand- 
ing what actually takes place in such a situ- 
ation. They err in assuming that every E solu- 
tion to a test problem is brought about by the 





same psychological process—namely, rigidity 
of behavior. Rather than study the process 
which brings about the E solution of the 
criticals, they concentrate on the endproduct of 
the process, the overt response, and label it ac- 
cording to their interpretation of the process: 
rigidity. Not what the subject did, but the in- 
vestigator’s assumptions as to what he did, is 
the basis of evaluation of a response. 

Moreover, the alleged rigidity in solving the 
criticals is taken as an indication of rigidity in 
the respondent's personality or of rigidity in his 
ego-defense system. His behavior is rigid be- 
cause he possesses rigidity. One is reminded of 
the outmoded belief that a thing burns becausc 
it has fire in it. Some clinicians believe that 
this rigidity is in the nature of a generalized 
factor which holds, say, for all aspects of an in- 
dividual’s behavior. Others see different parts 
of the personality structure as being character- 
ized by different degrees of rigidity. In most of 
this work the answer to rigidity of behavior is 
sought for in the respondent; it is considered as 
relatively independent of the field conditions 
under which the individual is operating. This 
approach ignores the chief finding of experi- 
mentation with the volume-measuring prob- 
lem—that is, that the Einstellung behavior is 
influenced by field conditions and cannot be 
understood merely as a characteristic of the in- 
dividual’s mental makeup. 

Confidence in the belief that they are dealing 
with an all pervasive rigidity factor in the indi- 
vidual’s personality is gained by some of these 
investigators when they find a positive relation- 
ship—usually a low one—between the scores 
made by the individual on various subtests of a 
battery designed to test rigidity. Incidentally, 
results on the various subtests of my manual on 
rigidity [7] have to date revealed no clear-cut 
positive relationship. But even if a positive cor- 
relation were generally the case, it would not 
follow that field conditions have been ruled out 
as determinants of behavior. The social atmos- 
phere of the test situations, the attitudes, inter- 
pretations, and methods of attack carried over 
from one subtest to another may help to create 
similar field conditions in the various subtests. 

Psychologists are not merely using the Ein- 
stellung effect as a measure of rigidity but are 
offering explanations of the so-called rigidity 
manifested in the volume-measuring problems. 











92 ABRAHAM 5S. LUCHINS 


Explanations are offered in terms of existing 
theories of rigidity of behavior. What is for- 
gotten is that these explanations stem from 
studies which may not be dealing with phe- 
nomena similar to the Einstellung effect. For 
example, rigidity on the Rorschach, the Ben- 
der-Gestalt, the tests prepared by Scheerer 
and Goldstein, the Vigotsky, the perseveration 
tests of Spearman and his students, the meth- 
ods used by Lewin and his students, may not 
all be brought about by the same process even 
though they are all labeled “rigidity.” Just be- 
cause the Einstellung effect is used as an in- 
dicator of rigidity does not make it possible 
to explain what takes place in the Einstellung 
tasks in terms of what is known about rigid- 
ity from other tests. 

It is becoming stylish to explain the Ein- 
stellung effect as due to (a) lack of vari- 
ability (b) lack of frustration tolerance (c) 
intolerance of ambiguity (d) concrete-mind- 
edness. Because of the popularity of these ex- 
planations, we have investigated, in a preli- 
minary manner, the relationship among these 
factors. Let me briefly survey the findings of 
two of the many studies aimed at checking 
some of the assertions. 

On the assumption that the more rigid the 
behavior the more fixed and less variable it is, 
rigidity is sometimes regarded as inversely or 
negatively related to variability — the more 
variable animal is less rigid [4, pp. 300-311, 
339-347]. 

With regard to variability, we find that a 
veritable lion has been seized by the tail. How 
is one to measure variability? If one takes as 
an indication of variability the number of re- 
versals in perspective in viewing a Necker 
Cube in a given time, or the number of ob- 
jects or geometric designs reported as seen in 
a complex drawing, or the number of differ- 
ent ways a subject is able to solve such 
problems as this one: given 10-, 25-, 5- 
quart jars, find as many different ways as pos- 
sible to get 5 quarts of water with the given 
jars as measures—then our results indicate 
that variability is positively related to Ein- 
stellung effect. Even in our animal experi- 
ments, it was found that the more variable 
animal prior to mechanization, as tested by 
mazes which allowed for various alternative 


solutions, was more readily mechanized and 
persisted longer in responding in a mechanized 
manner. We do not think it possible to 
conclude from the above findings that vari- 
ability is positively related to Einstellung ef- 
fect. ‘hese experiments point out .a number 
of interesting problems with regard to the 
measurement of variability. Are all responses 
to be given equal weight, and is the mere sum 
of responses to be the index of variability? 
Should a distinction be made between minor 
variations of a solution and genuine alter- 
natives? If so, on what basis is the distinction 
to be drawn? Is account to be taken of the ex- 
tent to which the response fits the given situ- 
ation, is required by, is adequate to the situ- 
ation? It would seem that the concept of vari- 
ability in problem solving is in need of recon- 
sideration. 

As another example of the explanations of- 
fered, consider the assertions that subjects 
who use the £ method of solving the critical 
problems are more concrete-minded in their 
mode of thinking than those who do not. We 
doubted this claim because in experiments in 
which subjects were given the task of measur- 
ing water with real jars, those who abstracted 
a general method of solving the tasks rather 
than solving each problem on its own merits 
tended to develop an Einstellung; that is, 
those who abstracted from the living reality 
of the situation were usually those who used 
the E method [cf. 9]. Moreover, in experi- 
ments in which subjects were told to get 
“zero quarts of water,” some subjects used 
the E method: filling the center jar and then 
emptying it by measuring with the two re- 
maining jars. Were these subjects concrete- 
minded or abstract-minded? To avoid such 
questions, let us take as an index of concrete- 
mindedness the score on the similarities sub- 
test of the Wechsler-Bellevue test. Examin- 
ation of the answers given to the pairs of 
words on this subtest could give some answer 
to the question of whether the subjects who 
develop an Einstellung are more concrete- 
minded than those who do not. We did not 
find a significant difference between the sub- 
jects who developed an Einstellung and those 
who did not, on any of the pairs of words. 
Moreover, when these same subjects were 





a oe 





A OR 


THE EINSTELLUNG-EFFECT AS A TEST OF RIGIDITY 93 


told to indicate another way in which the 
pairs of words are similar, and then again to 
give still another similarity, it was found 
that subjects who first gave abstract answers 
often failed to see another similarity. They 
overlooked the numerous so-called concrete 
and functional similarities between the two 
words. It would seem that being abstract- 
minded, as measured by the Wechsler-Belle- 
vue, does not guarantee that one will shift 
his mode of approach when it is demanded, 
that one will exhibit “nonrigid” behavior. I 
hope to report in a future publication on the 
several experiments studying this problem. 
Suffice it to say here that the relationship be- 
tween concrete-mindedness and rigidity is not 
as clear-cut as some clinicians purport it to be. 
Also, one can raise problems similar to those 
we raised regarding variability: namely, what 
is a sensible and what is a senseless answer, a 
genuine and superficial abstraction ? 


I have taken issue with the approach of 
treating concrete-mindedness, variability, frus- 
tration-tolerance, and so on, as if they were 
essences of the person operating relatively in- 
dependently of field conditions [cf. 8]. More- 
over, there is so little known about these con- 
cepts and so little known about rigidity that 
I see little point in relating or correlating one 
mystery and abstraction with another. It only 
shifts the problem to a different area. 

Psychologists are interested not only in re- 
lating rigidity to other constructs but in refin- 
ing the methods of measuring rigidity. Now, 
I wish to make it clear that I do not think 
that there is anything inherently wrong with 
attempting to determine within a short period 
of time, a few hours of testing, the probability 
that an individual will shift his behavior in 
real life situations in order to meet changing 
circumstances. Indeed, it was for such reasons 
that I developed the manual on evaluation of 
rigidity of behavior [7]. But it should be 
pointed out that at best the tests in this man- 
ual may test rigidity of behavior ; it is not pur- 
ported that they test or measure rigidity in- 
herent in the personality. Moreover, while the 
tests may have some predictive value in the 
clinic, the manual does not make any pretenses 
of explaining the phenomena underlying rigidi- 
ty in behavior. There is a tendency in psychol- 


ogy to confuse measurement and prediction 
with understanding or explanation. One may 
be able to measure and predict without having 
much understanding of the phenomena in- 
volved. 

Since so little is known about rigidity, 
should we be devoting all our time to measur- 
ing, with more and more refined tools, that 
which we do not understand? At the present 
time the most fruitful approach seems to me to 
involve intensive observation of and experi- 
mentation with rigidity of behavior under vari- 
ous conditions, if possible suspending biases as to 
the nature of the behavior involved, whether 
the biases stem from psychoanalysis or any 
other theory. Not only tests of the Einstellung 
type, but habit interference tasks, as well as 
actual clinical situations, for example, group 
therapy, should be observed. The aim should 
be to vary conditions systematically and to ob- 
serve what happens. As a final step—and not 
as a first step, as is so common today—one 
may be able to propose an explanation for 
such behavior. 

Without giving up my investigations involv- 
ing human beings, I am at present concentrat 
ing on utilizing a similar approach in studying 
rigidity of behavior in animals, particularly 
rats. Thus, one of the problems with which we 
are now concerned at McGill involves the rear- 
ing and observation of rats in the so-called 
free-environment popularized by Hebb [3, pp. 
296-299 |, and then studying the rats’ behavior 
in various test situations which allow for differ- 
ent degrees of rigidity in behavior. Other prob- 
lems in which we are presently interested are 
the determination of how to change conditions 
in order to produce a high degree of mechani- 
zation in rats that previously showed less, and 
the other way around. The conditions being 
varied involve hunger, brain damage, methods 
of learning, other experience, as well as the en- 
vironments in which the rat is raised and the 
environments in which he is kept during and 
after testing for E effects. While we are aware 
that such research may not be directly appli- 
cable to human subjects, it seems to us that it 
has merits on its own right, whether or not it 
may yield insight into the behavior of other 
species. Moreover, it is simpler to control the 
past experience and field conditions to which 
our rats are subjected than would be the case 








G4 ABRAHAM S. LUCHINS 


with human subjects. As a case in point: hu- 
man subjects might conceivably object to the 
brain operations or the prison environment 
which a proportion of the rats undergo. Fin- 
ally, it would not be out of place to mention 
that psychoanalytic thinking has so far not per- 
vaded the rat realm to the extent that rigidity 
in behavior must immediately be ascribed to 
something in the rat—say, to rigidity in the 
rat’s “personality” or “ego defense system.” 


Received July 5, 1950. 


REFERENCES 


1. FRENKEL-BruNSwIK, Etse, AND SANForD, R. N. 
Some personality correlates of anti-Semitism. 
J. Psychol., 1945, 20, 271-291. 

2. FRENKEL-BRUNSWIK, Evse. Intolerance of am- 
biguity as an emotional and perceptual person- 
ality variable. J. Personality, 1949, 18, 108-143. 


10. 


Hess, D. O. Organization of behavior. New 
York: Wiley, 1949. 

Hitcarp, E. R. Theories of learning. New York: 
Appleton-Century-Crofts, 1948. 


Lucuins, A. S. Mechanization in problem solv- 
ing. Psychol. Monogr., 1942, 54, No. 248. 
Lucuins, A. S. Proposed methods of studying 
degrees of rigidity in behavior. J. Personality, 
1947, 15, 242-246. 

Lucuins, A. S. An examination for rigidity of 
behavior. New York: Mimeographed manual 
and material distributed by New York Region- 
al Office of the Veterans Administration, 1948. 
Lucuins, A. S. Rigidity and ethnocentrism: A 
critique. J. Personality, 1949, 17, 449-466. 
Lucuins, A. S., AND Lucuins, Eprrn H. New 
experimental attempts at preventing mechaniza- 


tion in problem solving. J. gen. Psychol., 1950, 
42, 279-297. 


Roxeacu, M. Mental rigidity and ethnocentrism. 
J. abnorm. soc. Psychol, 1948, 43, 259-277. 

















THE ACCURACY OF SELF-EVALUATIONS: 
ITS MEASUREMENT AND SOME OF ITS 
PERSONOLOGICAL CORRELATES 


ROBERT R. HOLT 


‘ THE MENNINGER FOUNDATION 


TOPEKA, KANSAS 


NE of the standard ways of studying 
personality is through the self-concept, 
or at least that aspect of it which the 

subject is willing to reveal in self-evaluations. 
Recently, the self-rated questionnaire has fall- 
en into disrepute in some circles; it has been 
argued that most people are so self-deceptive, 
or so defensive about admitting many of the 
truths they know about themselves, that self- 
evaluation can be used only indirectly and 
cannot be taken at face value. 


Such a statement cannot be called true or 
false. There appears to be a tremendous range 
of variation, from the completely self-deluded 
to the rare man with an accurate appraisal of 
himself. In a dynamic psychology, variations in 
the phenomenon of self-evaluation and its inter- 
relations with other variables of personality 
are more interesting than an assessment of its 
general level for any group as a whole. 


Insight will be used in this paper to mean 
“the degree to which self-evaluations are ac- 
curate.” The insight that psychotherapists talk 
about, in contrast, should not be thought of as 
a mental state, an endowment, or attainment 
like one’s level of intelligence, but as an emo- 
tional as well as intellectual process—a con- 
tinuing journey defined by its direction rather 
than by an ultimate goal.t When the therapist 
says he wants his patient to become more in- 
sightful, he means that he wants to help the 
patient achieve an equilibrium which will per- 
mit him to continue, on his own, a process of 
self-understanding. 


1My friend, Dr. Paul Bergman, suggested this 
formulation, for which I am grateful. 


95 


When G. W. Allport first proposed a meth- 
od for measuring insight, in 1921 [1], he had 
no such conception in mind. Get a mathemati- 
cal expression, he said, of the relation between 
self-judgments of personality characteristics 
and the judgments of objective scientists, who 
presumably come close to the truth. When 
Sears [6] made his pioneering attempt to study 
relationships between self-evaluations and the 
judgments of others, he too spoke of insight but 
meant merely the accuracy of statements a per- 
son is willing to make about himself to a re- 
lative stranger in whom he has none of the 
trust and faith that a neurotic must have in his 
psychotherapist. It may be that the therapist’s 
concept of dynamic-process-insight and the ex- 
perimentalist’s operationally defined product- 
insight, are related. Perhaps the former is a 
necessary precondition for the latter. The task 
of the present communication, however, is 
limited to the study of an insight that is de- 
fined as the accuracy of experimentally ob- 
tained self-evaluations. 

There are various levels of this kind of in- 
sight, dependent on the aspects of himself « 
person is called upon to assess, which might 
range from external facts like one’s skin color 
to half-conscious motives. Different apparent 
accuracy would be brought about by variations 
in the situation in which self-judgments are 
called for—the nature of the experimenter, his 
relationship to the subject, the latter’s under- 
standing of the use to which the data were to 
be put, the public or private nature of the 
situation, etc. The number of judgments to be 
made must also have some effect on the result- 
ing measure as long as perfect generality of 








96 ROBERT R. HOLT 


insight is not assumed. These considerations 
make it necessary to specify in detail the pro- 
cedure followed, and to urge caution in gen- 
eralizing the results of the present study. 


THE MEASUREMENT OF INSIGHT 


Procedure. The method used was basically that 
urged by Allport. A comparative research project 
at the Harvard Psychological Clinic in 1941 and 
1942 afforded an unusually good chance to meet the 
requirement of good criterion ratings against which 
to validate the self-ratings. During two years, 20 
psychologists, psychiatrists, and other social scien- 
tists, led by Henry A. Murray, devised new tech- 
niques and re-explored old ones for the study of per- 
sonality. The subjects (Ss) were 10 more or less 
healthy college men, each of whom was seen by the 
experimenters (Es) for a minimum total of 40 hours; 
Ss were paid for their time. All £s learned a system 
of 148 personological concepts similar to those de- 
scribed in Explorations in Personality [4], and 
worked in terms of these variables, the final ratings 
on which were taken as criterion in the present 
study. A six-point scale (0-5) was used for all vari- 
ables, ratings being made in terms of the total popu- 
lation of Harvard undergraduates.2 The Diagnostic 
Council of 10 members formed two groups and 
reached an independent consensus for each S§ by 
pooling and discussing differences between the mem- 
bers’ independent ratings; the over-all reliability of 
ratings was about .80. In meetings of the Council, 
differences between groups were ironed out, Ss were 
ranked for each variable, and final scores were 
given, without knowledge of the Ss’ self-ratings. 


Such consideration has seldom been given to the 
rating of a group of Ss on an array of needs, traits, 
facts of past history, abilities, defense mechanisms, 
etc. Another advantage was that this internally con- 
sistent set of variables had been used for a year or 
so by all of us in our clinical work and had been 
thoroughly assimilated (with a few exceptions, to be 
noted later). 


The self-ratings were obtained during one of the 
Ss’ initial visits to the Clinic, after they had been ac- 
quainted with the general nature of the experiments 
—to study normal personality—and had been as- 
sured of anonymity and the confidence in which all 
data would be kept. The Ss agreed to be as frank as 
possible. When ratings were obtained, the Ss went 
individually to a room where they were met by a 
pleasant, motherly woman who showed them the 
forms to be filled out, and then retired to the other 
side of the room ostensibly to read a book. From her 
reports, it was plain that the Ss generally forgot her 
unobtrusive presence. The ratings considered here 
were obtained from a blank called Common Forms 
of Behavior. It consisted of 36 items, each of which 
was the definition, mostly in behavioral terms, of 


*For further description of the scale, see Murray 
[5, p. 32]. 


one of the overt needs,’ followed by a blank space 
for a rating of the degree to which the described be- 
havior was true of oneself. Ss used the same 0-5 rat- 
ing scale and the same reference population as did 
the Es. 


Even with the assumption that our criterion 
ratings (called E-ratings hereafter) are high- 
ly valid, three types of questions are raised by 
the present approach to the measurement of 
insight. 


1. Which personality variables should be used? 
Which kinds would give the best measure of the 
most important kind of insight? Such a question had 
probably never been asked in a study of this kind. 
That there may be different “insights” in the same 
personality, varying with the area that is evaluated, 
had been recognized.* But it is customary in the lit- 
erature to speak always of insight whether overt 
behavioral tendencies, covert needs, abilities, or what 
not are being judged, as if it made no difference. 
The results here pertain only to overt needs; it 
would be compare insights of the 
various kinds mentioned, but unfortunately it was 
not feasible. 


interesting to 


All but one of the 36 needs were included in the 
final measure of insight. The excluded one was the 
n (need for) Retention. The sum of the squared de- 
viations between S- and E-ratings of this need was 
eight times as great as in the case of the variable 
showing the best agreement, a Rejection. Not only 
was this discrepancy measure far greater for n Re- 
tention than for any other, but the Ss who were 
furthest from E-ratings of it were the ones who were 
most insightful by the criterion of the other 35 needs. 
Consequently it was discarded. 

2. What statistic is best for an insight measure? 
To express mathematically the correspondence be- 
tween two sets of numbers, the product-moment cor- 
relation coefficient is the first method that comes to 
mind, and it was the first that was used. The re- 
sults were suspect, however. For example, it was 
easy to see that one S$, Dupressy, came close to the 
mark most of the time, and that he rarely went off 
more than one point. Another S, Nailson, on the other 
hand, had numerous discrepancies of 2 or 3 points. 
Yet, r was .72 for Nailson, .19 for Dupressy, giving 
them ranks in the group of 1 and 9 respectively! 
How could this be ?5 


*For definitions of owert need, see Murray [4, pp. 
123-124, 252]. In the self-rating form, the needs were 
not named; the definitions were substantially the 
same as those given in the glossary (see footnote 6). 
An additional 10 items, constituting definitions of 
such variables as Super-ego Integration and some 
miscellaneous needs not rated by the Diagnostic 
Council, were interspersed throughout the form. 


*See Sears [6]. 


*For the following explanation I am indebted to 
my friend, Dr. Daniel Horn. 








THE ACCURACY OF SELF-EVALUATIONS 97 


The product-moment formula is a function of the 
squared deviations between the paired scores, and 
the variance of the two distributions. This means 
that each of the two distributions of scores being 
correlated is corrected for its own degree of varia- 
bility so they will be comparable, i.e., the scales used 
in the two sets of ratings are equated. Dupressy’s 
marks (both S- and E£-ratings) fell within a narrow 
range, which magnified little discrepancies; but 
Nailson’s E-scores varied greatly, usually in the 
same direction as his own mark but much further, 
so that he came out well when the range of the E- 
scores was statistically reduced. 

For many situations, this kind of equation of dis- 
tribution is highly desirable; in the present case it 
was not. The E- and S-ratings had already been 
equated for scale since both were made in terms of 
the same 0-5 scale, 0 meaning very much below the 
average of Harvard undergraduates, 5 very much 
above. What was needed was a statistic which could 


reach a maximum value only when scores were 
identical. 
The intra-class correlation coefficient (r’) meets 


this requirement, but it still involves correcting for 
the variance in the scores to get a coefficient rang- 
ing from +1 to 1. Since the coefficient of insight 
did not need to have such a property in this case, a 
simple measure was used, the sum of the squared 
deviations of S- from E-ratings (squared to elimin- 
ate signs and to weight more heavily big discrep- 
ancies). This measure (referred to hereafter as In- 
sight — with a capital I to distinguish it from the 
general concept) correlated .70 with Dr. Murray’s 
ranking of Ss in order of their “insight into overt 
aspects of one’s personality”’—more highly than the 
other two. Finally, its personological correlates made 
more sense than those of the rank-orders based on 
rorr. 


These details are included as an example of the 
difficulties met when one uses a conventional statistic 
without considering the relevance of its underlying 
assumptions to the data at hand. 


3. How can the patterning of personality variables 
be considered? The measure finally chosen unfortun- 
ately provides no solution for the problem of super- 
summativeness. Even if a subject marks every need 
correctly, he may conceivably have an erroneous no- 
tion of their interrelationships in his personality: 
which ones are more basic, which subsidiary; what 
themas they go to make up; what infantile com- 
plexes they are related to, etc. But the difficulties of 
quantifying this kind of insight resemble those of 
measuring “process-insight.” Therefore, it should be 
repeated that the Insight considered here is intellec- 
tual, cross-sectional, and somewhat atomistic, con- 
sisting of the S’s ability to assay objectively (or with 
the same biases as the Diagnostic Council) the overt 
motivational trends in his personality. We may un- 
derstand better what kind of phenomenon this is if 
we look at the aspects of personality to which it is 
related. 


TABLE 1 


PERSONOLOGICAL CORRELATES OF INSIGHT 


Organizing Intelligence .77° 
Originality of Thought 49 
Athletic Ability 75 
Athletic Achievement 67 
Endurance 54 
n Excitance: Adventure 56 
n Harmavoidance -.50 
Sentiment for Subjectivity and Passion 58 
Exocathection 52 
n Change: Novelty A6 
Breadth of Interest 62 
n Dominance: Conduct 66 
Leadership (in Adolescence) 66 
Social Adjustment 51 
n Afhliation: Diffuse 51 
n Infavoidance 49 
Social Ability 48 
Covert 2 Aggression —.61 
fantasied p Dominance: Coercion 78 
Projection .50 


When 
N = 10, rho must be .64 to be significantly different 


*These are rank-order correlation coefficients. 


from zero at the 5 per cent level, .72 for the 1 per cent 
level. 


PERSONOLOGICAL CORRELATES OF ACCURATE 


SELF-EVALUATION 


Table 1 presents all of the rated variables of 
personality® with which Insight was appreciably 
that the 
most intelligent Ss knew themselves best. Of 


correlated. It is reasonable to find 


the five aspects of intelligence rated separately, 
only the initial pair of variables correlated 
above .07 with Insight. They refer to creative 
facets of intellect; it would be interesting if 
further studies should bear out the implication 
that Insight may be a cause or a result of the 


®Definitions of these and most of the 148 person- 
ological variables may be found in a glossary which 
was eliminated from the present paper to save print- 
ing costs. Order Document 3124 from American 
Documentation Institute, 1719 N Street, N. W., 
Washington 6, D. C., remitting $1.00 for microfilm 
(images 1 inch high on standard 35 mm. motion 
picture film) or $1.50 for photocopies (6 x 8 inches) 
readable without optical aid. Definitions similar to 
those used in the study for most of the variables 
may be found in Murray [4]. 








98 ROBERT R. 


ability to reorganize experience constructive- 
ly.’ 

The second group of correlates in Table 1 
might represent some factor of constitutional 
These athletic attributes probably 
just indicate the sturdier members of this rath- 
er non-athletic group of Ss, which contained 
no member of a college team. But the constitu- 
tional hypothesis may not be warranted, and 
the second and third groups of variables in 


Strength. 


Table 1 might be considered together as a 
larger syndrome of active adventurous living in 
the world of reality. Our typical insightful § 
is adventurous rather than timid; he turns 
away from the inner life to a diversity of inter- 
ests in the real external world. This finding re- 
calls G. H. Mead’s theory that a person gains 
self-knowledge through learning more about 
others. Certainly when self-ratings are made in 
relation to the population of one’s contempor- 
aries and on observable forms of behavior, we 
should expect the introverted, timid, and nar- 
row person to do poorly at it. He might, on the 
other hand, do very well if insight were being 
measured in terms of the covert forces he dis- 
cerns in his soul-searching. 


In light of these points, it should be expected 
that in our group of Ss, dominance and social 
adjustment are related to the present measure 
of Insight, as shown by the fourth group of cor- 
relates. Insight thus goes not only with enough 
strength in body and mind to control others, 
but also with a basically friendly attitude to- 
ward them, free of resentment or feelings of 
being pushed around. 


In discussing these correlations, statements 
have been made imputing a causal relationship, 
where it seems the most natural explanation of 
the correlation. This is risky, because in any 
one of these correlations the causal relationship 
may go in one direction in one case and in the 
other in another case, with the most complex 
kind of interaction in a third. Furthermore, 
some of the results are most probably specific 
to this particular group of Harvard students, 


™In this connection, it is noteworthy that G. H. 
Mead’s theory of the self [3] posits an intimate con- 
nection between what he calls “self-consciousness” 
(taking the role of the other in relation to the self) 
and the ability to organize ideas and think con- 
structively. My wife, Dr. Louisa P. Holt, brought 
this similarity to my attention. 


HOLT 


whose homogeneity may have allowed relation- 
ships to emerge that would not hold, would be 
obscured, or would be expressed differently in 
another group. 


THE RELATION BETWEEN INSIGHT AND 
PROJECTION 

The last correlate, Projection, is an unex- 
pected one. So challenging a relationship can- 
not be brushed aside on the grounds of statisti- 
But 
Doesn't projection involve a lack of accuracy 
in judgment that is antithetical to insight? In 
his experiment on the attribution of traits [6], 
Sears found that his least insightful Ss project- 


cal unreliability. isn’t it a paradox? 


ed the most ; his result has been widely accepted 
as proof that insight and projection are incom- 
patible. 

The desirability of the needs rated. Closer 
examination shows, first, that Sears’ rather 
crude measures, both of projection and of in- 
sight, were different from those used in the 
present study. His technique allowed him to 
state only that a subject was insightful or unin- 
sightful, and projective or not. Secondly, all 
four variables he used were rated as undesirable 
stinginess (cf. n Retention), 
Order), stubbornness (n 
Resistance), and bashfulness (n 


by his subjects: 
disorderliness (n 
Autonomy: 
Infavoidance ). By contrast, the variables used 
in my measure of Insight were well distribut- 
ed between desirable, undesirable, and interme- 
diate, according to the Ss’ own ratings. When 
a measure of Insight based on the six most de- 
(sum of the 
+.54 with 
Projection, while a similar measure based on 
the six /east desirable needs correlates only 
+.04. Insight into the 23 intermediary vari- 
ables correlates +.36, so that there appears to 
be a linear relationship between the social ac- 
ceptability of needs and the degree to which 
projective persons self-rate them accurately. 
This finding does not seem to be based on a 
greater general accuracy of self-rating on the 
most desirable needs (see below). 


sirable variables is calculated 


squared deviations), it correlates 


How then are we to understand it? When a 
tendency is thought shameful or if recognition 
of it in oneself arouses anxiety, the demand to 
rate the amount of it that one possesses is a 
threat to self-esteem to which people will react 














THE ACCURACY OF SELF-EVALUATIONS 99 


according to the nature of their principal 
defense mechanisms. If projection is particular- 
ly characteristic of a person, he will tend to at- 
tribute the quality in question to others, by 
contrast to whom he will then seem to have 
little of it. His self-ratings, particularly if they 
are made in comparative terms, should there- 
fore tend to be inaccurate. No such interference 
should enter in when desirable characteristics 
are being rated, because there is not as much 
of a threat to arouse the projective defenses. 
These predictions are well borne out by data. 
The two Ss who were rated most projective 
underestimated in themselves those needs which 
they considered least acceptable socially and 
personally (as determined by a separate series 
of ratings using the Common Forms of Be- 
havior blank) : their respective average discrep- 
ancies were —.7 and —.8. Contrariwise, the 
two least projective Ss overestimated the needs 
which they considered least acceptable, their 
8 and +1.4. When 
the highly desirable needs are considered, both 
projective and 


mean deviations being + 


nonprojective Ss overrate 
themselves by similar amounts: projective +.5, 
+.9; nonprojective +.5, +1.2. 

Insight, Projection, and Intelligence. There 
still remains the main problem: why does the 
projective person do well in self-estimation 
when there is no problem of the social undesir- 
ability of what he rates? Perhaps because of 
the generalized sharpness, the observational 
acuity, and mental alertness that clinically are 
associated with projective tendencies. But these 
characteristics almost comprise a definition of 
certain aspects of intelligence. In diagnostic 
testing, a common working hypothesis is that 
projective trends may elevate a person’s scores 
on certain subtests of the Wechsler-Bellevue 
Scale (Arithmetic, Similarities, Picture Com- 
pletion). And indeed, Projection turns out to 
be correlated with Organizing Intelligence; 
rho is +.48. If Organizing Intelligence is then 
held constant by partial correlation,® the cor- 


relation between Insight and Projection drops 
to +.23. 


In this particular sample there is probably 
an unusually high mutual dependence of Pro- 
jection and intelligence, which may be a princi- 


®I owe this fruitful suggestion to my friend, Dr. 
George Klein. 


pal explanation of the correlation originally 
found between the experimental measure of In- 
sight and the rated variable, Projection. Even 
when a correction is made for this overlap, 
however, the relationship remains a positive 
one. The principal interest of these findings is 
that they demonstrate that a positive associ 
ation between Insight and Projection can exist 
in a certain type of population, and they point 
to the need for further research in which the 
measures of both traits should be systematically 
varied as well as carefully controlled. 


VARIATIONS IN INSIGHT ACCORDING TO 


VARIABLES RATED 

There were considerable differences between 
the variables in the ease with which the subjects 
as a group could give themselves accurate 
marks. ‘The mean squared deviation between 
E-ratings and S-ratings for all 36 variables and 
all 10 Ss was 2.21. The five best-rated needs 
had a mean D? of .9; they are, in order: n 
Rejection, n Dominance: Ideas, n Afhliation: 
Diffuse, » Dominance: Conduct and n Exci- 
tance. Ihe five worst-rated needs had a mean 
D? of 3.86; they are: n Seclusion, n Cogni- 
zance: Curiosity, n Change: Novelty, n Play- 
mirth, and worst of all, » Retention. Domi- 
nant and affiliative needs, which are correlated 
in this group with Insight, were also among 
the easiest for the Ss to judge validly for 


themselves. 


When the best- and worst-rated variables 
are compared, a number of sources of error 
emerge. Embarrassingly enough, most of them 
seem to be attributable to the /s rather than to 


the Ss. 


Sources of error: Definitions. In the case of 
several variables, errors in ratings seem to have 
been due to poorly worded definitions. The 
needs for Playmirth, Cognizance, and Change 
were in large part defined in terms of sub 
jective feelings, moods, or attitudes rather than 
overt behavior, as they were supposed to be. It 
would seem more difficult to rate someone 
(yourself or another person, in relation to a 
given population) on the first of the following 
statements than on the second: “To enjoy 
good-natured jokes and jests” (nm Playmirth) ; 
“To assemble, lead or organize a group” (n 
Dominance: Conduct). Other definitions suf- 





100 


fered from including very diverse forms of be- 
havior (nm Change: “To seek new experi- 
ences . . . or to be somewhat changeable and 
inconsistent” ). 


Influence of the desirability of needs. The 
Ss were asked to rate the desirability of each 
need, the extent to which it would be found in 
an “ideal man,” on the same Common Forms of 
Behavior blank. It could be seen that they had 
a tendency to overrate those items they rated 
as admirable and to underrate by about an equal 
amount the less acceptable needs. It seemed 
worth while to investigate the possibility 
that relatively neutral forms of behavior might 
be more accurately rated. The needs were then 
divided into three groups: six most desirable 
or ideal forms of behavior, six least desirable, 
and the remaining 23 intermediary variables. 
The mean squared deviation between FE- and 
S-ratings is 2.25 for most desirable, 2.10 for in- 
different, and 2.15 for undesirable ones. There 
was thus only a negligible trend in the direc- 
tion of the hypothesis. That strongly valent 
needs were no more deviantly rated than others 
by this group points to the special nature of the 
small population dealt with. All were college 
men, most of them considerably above average 
in scholarship, and some of them were chosen 
for study partly because they had interesting 
talents. 

By contrast, consider the mean self-ratings 
of 99 men in the Army Specialized Training 
Program at Harvard who filled out the same 
Common Forms of Behavior blank.® The same 
scale was used so that if the group was fairly 
representative and the members marked them- 
selves accurately, the mean rating for each need 
should be 2.5. The following needs deviated by 
more than a full point from the mean: Under- 
rated: n Aggression: Physical, n Succorance, n 
Sex: Diffuse ; Overrated: n Affiliation: Focal, 
n Sex: Focal, and n Blamavoidance. Here the 
influences of the social acceptability of the form 
of behavior and of wish are very clear. Few 
people can assess correctly their own direct ex- 
pressions of aggression, sexuality, and depend- 
ence, while friendship, love, and moral be- 
havior are goals which most people are only too 


®The following data were kindly made available 
to me by Dr. Daniel Horn, who obtained the ratings 
from the Army Specialized Training Program. 


ROBERT R. 





HOLT 


eager to say they pursue. 

Familiarity of Es with variables rated. To 
return to the original sets of data: Three of the 
five needs on which agreement is poorest are 
among a group of five which were added at the 
end of the study. Thus, we did not have these 
variables specifically in mind when dealing with 
the Ss, and they were not put through the 
process of so many ratings, justifications, and 
discussions. Feeling that we knew the Ss very 
well, and that the forms of behavior in ques- 
tion had all actually been observed or inquired 
into, we made ratings on the needs for Acqui- 
sition, Physical Aggression, Cognizance: Curi- 
osity, Playmirth, and Retention.2° The last 
three were among the four needs into which 
the Ss had poorest insight, if our judgments 
were a criterion. Now, it is likely that a person 
would know whether or not he was saving and 
thrifty and tended to hoard things, while the Es 
probably overlooked the fact that the definition 
was stated only in terms of physical objects and 
perhaps were influenced by clinically observable 
retentiveness of information and affection. The 
definition of n Playmirth almost invites one to 
rate his own sense of humor—a matter in 
which people are notoriously self-deceived. All- 
port [2, p. 224] found, for example, that 94 
per cent of a group of college students thought 
that their sense of humor was at least as good 
as average. If all the discrepancies were posi- 
tive, such an explanation would be plausible, 
but actually four underrated and four over- 
rated. 


Observability of relevant behavior. Aside 
from the difficulty caused by the subjective 
elements in the definition of n Playmirth, it 
may be presumed that the Es had insufficient 
opportunity to observe the Ss in a relaxed, 
natural setting in which spontaneous, humorous 
playfulness might assert itself. This comment 
applies even more to the need for Seclusion, 
which is obviously difficult to observe. Con- 
trariwise, turning to the variables well self-rat- 
ed, the results are easiest to explain on the as- 
sumption that the self-ratings were generally 
valid, and that the Es were able to approximate 
them closely because the kinds of behavior rat- 
ed were involved directly in the interaction be- 
tween E and § in the Clinic. Situations were 


1°The need for Exposition, which was not self- 
rated, was also added. 











THE ACCURACY OF SELF-EVALUATIONS 


provided in which the Es could directly observe 
Ss arguing and attempting to influence behavior 
and ideas of others (n Dominance: Ideas, and 
n Dominance: Behavior), as well as their 
generalized friendliness or aloofness (n Afhli- 
ation: Diffuse, and n Rejection). It is more 
dificult to explain the good agreement on n 
Excitance. It was not well defined and not too 
easily observable, though it had the possible 
slight advantage of being neither a very highly 
prized nor an undesirable form of behavior. 
The conclusions suggested by these last find- 
ings are not particularly novel; they reaffirm 
what was already known about personality rat- 
ings. Definitions of variables must be clear and 
single-pointed. When ratings are being made 
on a scale which implies comparison of the sub- 
ject (or self) with a group, the variable is best 
defined in terms of overtly observable behav- 
ior, so that a person can be expected to make 
comparisons without resorting to guess. (Of 
course, the nature of some variables inevitably 
makes this kind of definition inappropriate. ) 
Finally, no matter how well a group of Es may 
feel they know their Ss after exhaustive study, 
there are dangers in adding variables to be rat- 
ed after the data have all been gathered and 
analyzed. Probably more valid results are ob- 
tained when all Es understand all variables 
clearly from the beginning and have them all in 


the back of their minds throughout their work 
with the Ss. 


SUMMARY 


1. Problems in the definition of Insight and 
the choice of a suitable statistical measuring 
instrument were discussed. The unsuitability 
of the most obvious technique (correlation of 
ratings) was demonstrated by reference to its 
underlying assumptions. The measure adopted 


101 


was the sum of squared deviations between a 
subject’s self-ratings of 35 needs and criterion 
ratings by a Diagnostic Council. 

2. In a population of 10 college students, 
this measure of Insight was found to be related 
meaningfully as well as mathematically to 
measures of intelligence, active adventurous 
living in the world of reality, friendly domi- 
nance and social adjustment, and ( possibly) 
constitutional strength. 


3. The positive correlation of Insight and 
Projection was scrutinized in terms of the de- 
sirability of needs rated and the relationship of 
each to Organizing Intelligence. Even with the 
latter held constant, Insight and Projection 
were not antithetical. 


4. Dominant and affiliative needs were most 
accurately rated by the Ss taken as a group. 
Only slight and insignificant tendencies were 
found for these Ss to overrate their most high- 
ly prized needs and to underrate distasteful 
ones. The greatest discrepancies between E- 
ratings and S-ratings seem to be attributable to 
defects in the definitions of the variables. 


Received June 12, 1950 


REFERENCES 
1. Aiport, G. W. Personality and character. Psy- 
chol. Bull., 1921, 18, 441-455. 
A.uport, G. W. Personality, a psychological in- 
terpretation. New York: Holt, 1937. 
3. Meap, G. H. Mind, self and society. Chicago: 
Univ. of Chicago Press, 1934. 
4. Murray, H. A., et al. Explorations in person- 
ality. New York: Oxford Univ. Press, 1938. 


Murray, H. A., et al. Assessment of men. New 
York: Rinehart, 1948. 


N 


Vv 


6. Sears, R. R. Experimental studies of projection: 
I. Attribution of traits. J. soc. Psychol., 1936, 7, 
151-163. 








A PRELIMINARY TEST OF ROLEPLAYING ABILITY’ 


WILLIAM A. McCLELLAND 


BROWN UNIVERSITY 


role-taking or role-playing has been the 

concern of many sociometrists [5, 8, 11, 13, 
14], psychologists working in the area of hu- 
man relations [1, 9, 12] and in personality as- 
sessment [6, 7, 15, 18], and psychopathologists 
[3, 4, 10, 17]. The term “role-taking ability” 
refers to the facility with which a person can 
perceive and act out organized behaviors or 
roles (i.e., putting himself in someone else’s 
position ). 

Personality theory does not as yet offer any 
evidence that “role-playing ability” is a trait. 
It may well be a combination of some more 
basic factors, for example, Guilford’s social 
introversion-extroversion and objectivity. But 
it appears to be a useful concept in describing 
behavior, and it is for this reason that the cur- 
rent study was undertaken. 


|: the past five years the general topic of 


Despite a dearth of experimental evidence, 
certain speculative statements can be made 
about role-playing ability which may serve to 
make the concept clearer: 


1. It is largely a product of social interaction. 
Unless there has been previous experience with 
a particular role, a realistic portrayal of the 
role would be highly unlikely [16]. Although 
reading about various roles or experiencing 
them in other vicarious fashions is informative 
to the role-player, it appears safe to state that 
direct personal experience in social situations is 
the most important way in which the learning 
of these organized systems of behavior (or 
roles) takes place. It is obvious that sociom- 
etrists believe that training in “spontaneity” 
(the “ability to function adequately in a specif- 
ic role” [8, p 50]) can increase the facility 
with which roles are taken. The suggestion by 
Sarbin [16] and by Gough [10] that role-tak- 
ing practice be used as therapy with certain ab- 
normal groups again points to the apparent im- 


102 


portance of learning. Psychologists working in 
the field of human relations [1, 9, 12] stress 
the value of role-playing as a training device 
in the development and improvement of social 
skills. The authors of a recent Commission of 
Community Interrelations research study [11] 
report that practice in role-playing was an aid 
in the training of depth interviewers. 


Although it is difficult to conceive of a de- 
finitive experiment as a result of which proper 
weights might be assigned, it appears entirely 
reasonable from other research on nature-nur- 
ture that there also exist important genetic fac- 
tors which determine the facility with which 
one can take roles. For example, somatic fac- 
tors, sensory and motor thresholds, the nature 
of the “neurological wiring” of the individual, 
are also likely determinants of role-playing 
ability. The importance of learning alone must 
not be overemphasized. 


2. It is a quantitative variable. There appear 
to be wide inter- and intra-individual differ- 
ences in ability to take roles. Sarbin [16], for 
example, has speculated that such heterogene- 
ous groups as actors, hysterics, and persons who 
are successful in face-to-face relations are good 
at role-taking. On the other hand, feeblemind- 
ed persons, schizophrenics, and some types of 
organic brain damage cases, he believes, are 
poor at it. Shor’s report of his experiences as a 
psychotherapist in a military NP section [17] 
tends to corroborate this. Cameron [3], in the 
development of the concept of a paranoid 
pseudo-community, leans heavily on the hy- 
pothesis that this nosological group (paranoia) 
has very poor role-taking ability. Gough [10] 
has hypothesized that the psychopathic person- 
ality is deficient in the capacity to look upon 
himself as a “social object,” a capacity which 


1This paper was presented in part at the Eastern 
Psychological Association, in Worcester, Mass., 1950. 











A PRELIMINARY TEST OF ROLE-PLAYING ABILITY 


appears to be a derivative of. role-playing abili- 
ty. While sound experimental evidence is lack- 
ing on this point, a wealth of clinical data ex- 
ists to support the contention that this ability 
varies in amount from person to person and 
probably within the same individual from role 


to role [15]. 


3. The concept appears to be a useful one de- 
spite the unclear nature of role-playing ability 
as a “trait” and despite lack of data on the 
question of modifiability of the “trait” with 
training. Many writers have found it helpful 
in describing behavior as well as an aid in the- 
orizing about the nature of normal and deviate 
behavior [2, 3, 6, 7, 10, 16, 17, 18]. If we will 
admit as evidence testimonials and clinical ob- 
servations, psychodrama appears to be helpful 
as psychotherapy, role-taking training 
seems to have been valuable in the preparation 


and 


of interviewers, in leadership training, in con- 
ference behavior, and as a counseling technique. 


PURPOSE OF THE CURRENT STUDY 


These three points about role-taking ability 
should clarify the setting of the present study, 
which was designed to obtain some preliminary 
quantification of role-playing ability as a 
“trait.” It is not strange that aside from the 
“situation tests” described by both sociometrists 
and psychologists few attempts have been made 
to measure role-playing ability. It would seem 
feasible to begin making estimates of role-play- 
ing ability by observing actual role-playing. 
Franz [8] suggested in 1940 a test based upon 
a set of standard situations in which the differ- 
ent subjects played the set group of roles while 
the auxiliary egos played standard roles. This 
was done in the OSS assessment program some 
years later [2, 18]. The technique, actually a 
work-sample test, may well be preferable to the 
one described in this study, but as a diagnostic 
tool it is usually cumbersome, difficult to score, 
and time consuming. 


Dymond [6, 7] has recently developed a 
promising measure of empathic ability (i. ¢., 
“the imaginative transposing of oneself into 
the thinking, feeling and acting of another and 
so structuring the world as he does” [7, p. 
127]). As this scale of empathic or role-playing 
ability requires a mutual set of self- and other- 
ratings by a pair of subjects, its use is limited 


103 


in individual counseling or psychotherapeutic 
situations. Thus it is apparent the existing 
methods of measuring this behavior all have 
limitations. 


Before beginning an explicit discussion of 
methodology, it would be well to state more 
specifically the goals the author had in mind 
in initiating this study. In addition to the gen- 
eral problem of isolating and quantifying role- 
playing ability as an entity, it was firmly be- 
lieved that if a brief but reasonably valid scale 
could be derived, such an instrument would be 
of considerable utility in further research on 
the general topic of role-playing. If this short 
scale could be developed, it was further believed 
that it should be capable of incorporation with 
other procedures used in the assessment of po- 
tential interviewers, counselors, and clinicians 
or in any other circumstances where role-taking 
ability was believed or shown to be significant. 


The Minnesota Multiphasic Personality In- 
ventory (MMPI) appeared to be a suitable 
vehicle for such a project. In view of its wide 
range of item content and of the fact that a 
number of new nonpathological scales have 
been derived from the 550-item pool (for ex 
ample, social introversion, social status, domi 
this test 
for use in the current study. 


nance, and social tolerance), was 


selec ted 


SUBJECTS AND METHOD 


Three groups of subjects were utilized in 
this preliminary investigation. Group I con 
sisted of 50 graduate students in psychology 
(31 men and 19 women) at the University of 
Minnesota. Of this group 22 were working to 
wards the Master’s degree and 23 toward the 
Ph. D. The remaining five had attained the 
M.A. or M.S. and were no longer degree can- 
didates. In chronological age two were under 
twerity and seven over thirty. About one-third 
of these graduate students had completed one 
year or less of graduate work, another third 
had completed two years, and the remainder 
had completed more than two years of gradu- 
ate work. All had full-time or part-time em- 
ployment as counselors or clinicians, or psy- 
chology and 
The role-playing 
scale was derived from this group. 


instructors (and re- 


(RP) 


teaching 


search) assistants. 








104 


The second sample, Group II, consisted of 
50 Brown University undergraduate majors in 
psychology enrolled in a required course in psy- 
chological testing. This group was composed of 
33 women and 17 men; 43 were seniors and 7 
juniors. The third group was sinall. It consisted 
of 11 advanced graduate students, three women 
and eight men, who were enrolled in a one-year 
practicum course in counseling at the Uni- 
versity of Minnesota. Data were obtained from 
Group I during June and July, 1948, from 
Group II in November and December, 1948, 
and from Group III in October, 1948, and in 
June, 1949. 


A pool of 97 items was assembled by the 
writer from the MMPI, to which were added 
16 original items. It was speculated that items 
expressing attitudes or feelings toward social 
situations would be more worth while and more 
economical to investigate than the entire 
MMPI pool. ( This decision is probably an im- 
portant limitation of the study.) Of the 97 
MMPI items, 45 were not included in any 
clinical or validating scale, and the remainder 
was spread throughout all scales, particularly 
on Hy (11 items), D and Mf (7 items each). 


After the scale had been administered in- 
dividually to the 50 subjects in Group I, 
ratings of role-playing ability for each sub- 
ject were obtained from the group members 
themselves. These ratings were used as the 
criteron measure of role-playing ability. A 
list of the 50 persons in Group I who had 
answered the 113 items was presented to 
each respondent. He was asked to select the 
six people he knew best (more, if he felt 
he knew them well enough) and rate them 
on six “traits.”’ These “traits,” believed to 
be important aspects of role-playing ability, 
were referred to as “an aspect of social be- 
havior” in the rating instructions. Ratings 
were also obtained individually. It was not dif- 
ficult for most of the respondents to rate six 
of his classmates or co-workers. Of the six 
“traits” on the scale, only five were used to ob- 
tain the criterion measure of role-playing abili- 
ty. The sixth was dropped because the raters 
claimed they had no basis for making such a 
rating. (This question was designed to act as 
an anchor item and asked, “How capable is the 


WILLIAM A. MCCLELLAND 


subject in a role-playing situation?) The 
other five items were as follows: 

1. Social Intelligence. How well does he appre- 
ciate the demands of group and social situations? 


2. Reaction Sensitivity. How sensitive is he to the 
feelings and attitudes of others? 


3. Prediction of Behavior. How well can he pre- 
dict what others will do? Does he seem to know 
what actions others are going to take? 


4. Rigidity. How rigid and inflexible is he in in- 
terpreting social behavior? 

5. Self-Objectivity. Is he able to evaluate his own 
behavior objectively, that is, as others would evalu- 
ate it? 

Each of the five items was rated on a five- 
point scale—markedly above average, above 
average, average, below average, and markedly 
below average. Rating instructions suggested 
using people known to the rater as a frame of 
reference. Alternatives were weighted 5-4-3-2- 
1 (with 5 representing markedly above aver- 
age), and the criterion measure of role-play- 
ing ability was the simple arithmetical average 
of ratings on all five items. 

As only cases with multiple ratings (two or 
more) were used in the item analysis, four cases 
of the 50 were lost because only one set of rat- 
ings was obtained for them. Average ratings 
for these 46 cases ranged from 4.2 to 1.8, with 
a mean of 3.4 and a standard deviation of ap- 
proximately .3. Reliability of the ratings (odd 
vs. even raters) was .50 corrected to .66 by 
the Spearman-Brown prophecy formula. The 
reliability coefficient is rather low, but inas- 
much as the study is largely exploratory, the 
criterion is tentatively accepted despite the 
fact that its reliability leaves something to be 
desired. An item analysis with the upper and 
lower 27 per cent was performed using Jur- 
gensen’s tables for phi coefficients. In the ad- 
jectival terms of the ratings, the lower 27 per 
cent were “average” or “below average” 
(average ratings below 3.2) while the upper 
27 per cent were “above average” (average 
ratings above 3.7) in role-playing ability. It 
is unfortunate that the Ns in the two samples 
are so small (and for this reason if for no 
other the scale must be considered tentative), 
but the 50 cases investigated constituted the 
entire local population of persons who were 
acquainted with most of the other students in 
the group. The item analysis revealed that 








A PRELIMINARY TEST OF ROLE-PLAYING ABILITY 


about 20 items differentiated “good” from 
“poor” role-players at the 20 per cent level, 
13 at the 10 per cent level, 3 at the 5 per cent 
level and 1 at the | per cent level. With a 
few “intuitive” changes and elimination of 
the three surviving original items (no? in- 
cluded in the MMPI item pool), a 32-item 
scale was compiled. ‘These items are present- 


ed in Table 1. 


TABLE 1 


Form oF THE Roie-PLayinc (RP) 
(Persons Scorinc Hich Witt ANSWER 
ITEMS IN THE DiRECTION INDICATED) 


PRELIMINARY 
SCALE 


1. Policemen are usually honest. True. 


N 


I tend to become interested in several different 
hobbies rather than stick to one of them for a 
long time. True. 

3. I like to keep people guessing what I’m going to 

do next. False. 

*4. I don’t blame anyone for trying to grab every- 

thing he can get in this world. False or ?. 
5. I am often inclined to go out of my way to win 
a point with someone who has opposed me. False. 
*6. I am against giving money to beggars. False 
or ?. 
7. If I were in trouble with several friends who 
were equally to blame, I would rather take the 
whole blame than to give them away. True or?. 
8. I tend to be on my guard with people who are 
somewhat more friendly than I had expected. 
False. 

9. I seem to be as capable and smart as most others 
around me. True. 

*10. If several people find themselves in trouble, the 
best thing for them to do is to agree upon a 
story and stick to it. False or ?. 

11. I have no patience with people who believe 
there is only one true religion. False. 

*12. I usually “lay my cards on the table” with 
people that I am trying to correct or improve. 
False or ?. 

13. I enjoy many different kinds of play and rec- 
reation. True. 

14. I wish I could get over worrying about things 
I have said that may have injured other people’s 
feelings. False. 

. It makes me impatient to have people ask my 
advice or otherwise interrupt me when I am 
working on something important. False. 

16. I am apt to pass up something I want to do 

when others feel that it isn’t worth doing. False. 

17. People can pretty easily change me even though 
I thought that my mind was already made up 

on a subject. False. 

18. My way of doing things is apt to be misunder- 
stood by others. False. 

19. I am not likely to speak to people until they 

speak tome False. 


105 


20. I have very few quarrels with members of my 
family. True. 


*21. It is not hard for me to ask help from friends 
even though I cannot return the favor, False 
or ¢ 

ofa) 


22. I believe that my home life is as pleasant as 
that of most people I know. True. 

*23. I think I would like the kind of work a forest 
ranger does. True or 

24. I am 

False 


5. I do not mind being made fun of. True. 


more sensitive than most other people 


I refuse to play some games because I am not 
good at them. False. 

27. I should like to 
lodges 


belong to several clubs and 
True 

At parties I am more likely to sit by myself or 
with just one other person than to join in with 
the crowd. False. 

29. I have used alcohol excessively. Tru 
30. I wish I were not 

31. Once in a while I 


of my family whom I usually love 


a”) shy | alse 


feel hate towards members 
[rue 
2. I am liked by most people who know me. True. 


w~ 
~ 


*These items can be scored by stencil quite easily by a 


reverse scoring procedure, In item 4, for example, as lorz 
as the subject doesn’t say ‘“‘True,”’ he has answered the 
item correctly. Circle these seven 


reminder to 


starred items on the 


stencil as a score them in this reverse 


fashion. 


The scale was then administered to Groups 
II and III as a partial check on its validity. 
It was planned to use Group II further as a 
source of data on the relation of the role-play- 
ing (RP) score to various personality and 
ability measures. Ratings of each other by the 
members of Group II were also obtained. It 
was felt that Group III (the 11 graduate 
students in the counseling practicum) offered 
an estimate of the RP scale’s possible validity 
as a predictor of counseling skill. These stu- 
dents took the scale in October, 1948, when 
they entered the course. They were closely ob- 
served in class and while actually counseling 
by expert counselors throughout the academic 
year. Multiple ratings of these students were 
obtained by the counselor supervisors upon 
completion of the course in June, 1949, 


RESULTS 

Reliability. An estimate of the test-retest 
reliability of the RP scale was obtained from 
Group II. The subjects answered the RP 
questions twice: once as a separate 32-item 
test and a month later when they took the 








106 


MMPI. The test-retest reliability for the 37 
students for whom both scores were available 
was .69. Considering the usual difficulties in- 
volved in estimating the reliability of a per- 
sonality test and the fact that the test was not 
taken in exactly the same form each time, this 
figure appears reasonably satisfactory. 
Validity. The first of the two checks on 
the 
ratings obtained from Group II did not prove 


scale’s validity was not definitive. ‘The 


at all satisfactory for several reasons. ‘The stu- 
dents did not know enough of their classmates 
to yield a satisfactory number of multiple 
ratings; the reliability of the multiple ratings 
that were obtained was low; there was very 
little spread in the ratings. However, using 
the top 10 and the bottom 10 students on the 
average ratings (approximately the upper and 
lower 27 per cent) for calculating a f-test of 
the significance of the mean RP score differ- 
ences (on the second or MMPI administra- 
tion) yielded a result significant at the 15 
per cent level of confidence—suggestive, but 
hardly conclusive evidence of the scale’s va- 
lidity. 

Group III cases were ranked on the RP 
scores and on the average ratings of role-play- 
ing ability by the supervising counselors. The 
rank order correlation between the two was 
.59, significantly different from a zero cor- 
relation at between the 10 per cent and the 
5 per cent levels of confidence. An attempt 
was made to correct this correlation for the 
lack of complete reliability of the criterion 
measure and of the test. Two tenuous assump- 
tions were made—that the test reliability co- 
efficient obtained on Group II would be typi- 
cal of Group III, and that the criterion (rat- 
ings) reliability of Group I would be a rea- 
sonable estimate of the reliability of the rat- 
ings in Group III. This procedure gave a 
corrected correlation coefficient of .87. 

Because of the small Ns involved, the 
select nature of the samples (geographically 
and academically), and the use of only two 
check samples, the RP scale must be consider- 
ed as preliminary only. More evidence of the 
scale’s validity is needed. It is hoped research 
workers in counselor or clinical training pro- 
grams with access to existing data will carry 
this investigation further. 


WILLIAM A. MCCLELLAND 


OTHER ASPECTS OF THE MMPI 


“ROLE-PLAYING KEY 


l . Se x 


cance ol! 


Differences. “Vests for the signifi 


mean and variance differences be- 


tween the scores of men and women were 
made for Group I, the group upon which RP 
was based. Neither F nor ¢ revealed differ 
ences that could not have occurred through 
chance. Separate sex norms, therefore, are not 


Men 


combined for all groups in the study. 


2. Ability Differences. Ability scores on 
the revised Army Alpha, Form 5 (Wells) 
and on the Atwell-Wells Wide Range Vo 
cabulary ‘Test, Form B, were available for 37 
of the 50 subjects in Group II. F and t were 


(“high” RP 


not indicated. and women have been 


computed using extreme groups 


scores—upper 27 per cent—and “low” RP 
scores—lower 27 per cent). RP scores were 
obtained from the second administration of 


MMPI). Only chance differ 
ences in variances and means were found. It 
this sample that RP 
scores are not related to general ability. 


the scale (or 


appears in restricted 

3. Personality Differences. RP scores were 
derived for Group II from the MMPI ad 
ministration (N 37). Again the top 
(N 10) and the bottom (N 10) 27 per 
cent were used and summary statistics com- 


puted for raw scores for each sample on se- 


TABLE 2 


SUMMARY STATISTICS AND f-TestT FoR “HIGH” AND 
“Low” Scorers ON THE ROLE-PLAYING ABILITY 
Test on CerTAIn MMPI ScALes 
(UNCORRECTED FoR K) 


Standard 


Mean Deviation 
High Low High Low 
Scale RP RP RP RP t Vv 
L 4.6 3.1 2.46 1.19 5.26 1% 
F 2.4 5.0 2.06 2.76 7.17 1% 
n 18.4 13.8 4.43 2.40 2.04 <5%>10% 
He 3.6 4.9 2.25 3.86 2.76 <1%> 5% 
D 16.8 22.8 2.32 5.40 9.72 1% 
Hy 23.2 21.2 2.46 4.62 4.35 1% 
Pd 11.5 17.3 3.23 4.12 10.50 1% 
Mf 36.7 85.8 3.16 3.99 1.68 >10% 
Pa 8.7 8.2 1.85 3.40 1.23 >10% 
Pt 7.4 15.0 6.09 6.03 8.41 1% 
Se 7.1 11.1 4.63 4.57 5.83 1% 
Ma 14.8 19.3 3.22 3.10 9.55 1% 
Si 16.8 26.9 3.39 8.51 2.44 <1%> 5% 











A PRELIMINARY TEST OF ROLE-PLAYING ABILITY 


lected MMPI keys. ‘Table 2 


data. 


presents these 


As three men were included in these data, 
all scores were converted to the female norms 
(using the published MMPI norms for those 


scales where 


separate sex norms exist). It 
will be noted in ‘Table 2 that the low role 


players have significantly higher scores on the 
scales for Pt, Sc, Ma, Pd, D, and F. Vhose 
with high RP scores are significantly higher 
on Hy and L. 


normal 


Since all scores are within the 
between forty-fifth 
and sixty-first percentiles on MMPI norms), 


range (actually 
as might be expected with this college group, 


clinical importance cannot be attached to 
these differences. It might be speculated, how- 
ever, that persons with high RP scores are 
somewhat freer from the distress and per 
plexity usually manifested in cases with such 
MMPI profile clevations, but are also more 
characterized by the self-deceptive and re 
pressive defenses associated with elevations on 


L and Hy. 
4. Norms. No RP norms can or 


should be offered at this this time. The largest 


formal 


groups of students, Groups I and II, could 
not be that 
samples almost certainly come from different 
populations. 


combined as tests reveal these 
A cutting score of 19 or fewer 
items could eliminate most of the persons in 
role- 
This 


score, however, is at the fiftieth per 


Group I rated as “below average” in 


playing ability—about 20 per cent. 
cutting 
centile for Group II, the undergraduate ma- 


jors in psychology. 


SUMMARY 


1. This study has as its main purpose the 
heuristic isolation and quantification of role- 
playing ability as a “trait.” Further, an at- 
tempt to measure this behavior was believed 
worth while, for role-playing ability seems to 
be an important and unmeasured aspect of 
successful counseling and psychotherapy. 


2. A preliminary scale of 32 items purport- 
ing to measure role-playing ability was de- 
rived from the MMPI. Original subjects were 
50 graduate students in clinical and person- 
nel psychology at the University of Minne- 
sota. 


107 


3. Split-half reliability of the criterion 
(multipe student ratings of role-playing abil 
ty) was .50, corrected to .66; test-restest reli 
ability of the scale was .69 over a one-month 
interval on a second sample. 


4. ‘Two check samples with small Ns (50 
undergraduate majors in psychology and 11 
praduate students in a counseling practicum) 
were used as further tests of validity. Results 
scale’s validity, but fur 


are sugvestive ot the 


ther validation 1s considered essential before 


the scale can be used for other than re earch 


purposes. 

5. No significant sex differences nor gener 
al ability differences were found on the RP 
scale. On the MMPI, good role-players (high 
RP scores) had significantly lower scores on 
the F, D, Pd, Pt, Sc, 


nificantly higher scores on the L and Fy scales 


and Ma S¢ ales and Siz 


than did poor role-players (low RP scores). 
Received June 18, 1950 


REFERENCES 


1. Baveras, A. Role playing and 


management 


training. Sociatry, 1947, 1, 183-191 
2 SRONFENBRENNER, [ ann Newcoms, T. M. 


an application of psychodrama 
Sociatry, 1948, 1, 367 


Improvisations 
in personality diagnosis 
382 

3. Camenon, N. The paranoid pseudo-community. 
Amer. J. Sociol., 1943, 49, 32-38 

4. Cameron, N. The psychology of behawor dis 

Houghton Mifflin, 1947 


AND Cornyetz, P 


orders. Boston 


pet Torro, J., Psychodrama 


Ww 


as an expressive and projective tecanique. So- 
ciometry, 1944, 7, 356-375 
6. Dymonp, Rosauinp F 
tion of the relation of insight and empathy. J. 
consult. Psychol., 194%, 12, 228-233 
Dymonp, Rosauinp F. A scale for the measure- 
ability. J 


A preliminary investiga- 


~a 


ment of empathic consult. Psychol., 
1949, 13, 127-133. 

8. Franz, J. 
research. Sociometry, 1940, 3, 49-61. 

9. Frencu, J. R. P., Jr. Retraining an autocratic 
leader. J. abnorm. soc. Psychol., 1944, 39, 224- 
237. 

10. Goucn, H. G. A sociological theory of psycho- 
pathy. Amer. J. Sociol., 1948, 53, 359-366. 

11. Kay, Littian W., Ann Scuicx, Jane H. Role- 
practice in training depth interviewers. Sociom- 
etry; 1945, 8, 82-85. 

12. Luipprrt, R., Braprorp, L. P., ano Benne, K. D. 
Sociodramatic clarification of leader and group 
roles as a starting point for effective group func- 
tioning. Sociatry, 1947, 1, 82-91. 


G. The place of the psychodrama in 





15. 





WILLIAM A. MCCLELLAND 


Moreno, J. L. Situation test. Sociometry, 1946, 
9, 166-167. 


Moreno, J. L. Spontaneity theory. Sociometry, 
1944, 7, 339-356. 


Rorrer, J. B., anp Wickens, D. D. Consistency 
and generality of ratings of social aggressive- 
ness. J. consult. Psychol., 1948, 12, 234-239. 


Sarasin, T. R. The concept of role taking. Soci- 
ometry, 1943, 6, 273-285. 

SuHor, J. A modified psychodrama technique 
for rehabilitation of military psychoneurotics. 
Sociatry, 1948, 1, 414-420. 


18. Symonps, P. M. Role playing as a diagnostic 


procedure in 
1947, 1, 43-50. 


selection of leaders. Sociatry, 











THE CONSTRUCTION AND VALIDATION OF 
TWO INSIGHT INVENTORIES 


DAVID GROSSMAN 


LOS ANGELES PSYCHOLOGCICAI 


EARLY all theories of psychotherapy 
place emphasis on insight as a factor 
in the improvement of a patient [12, 
p. 277; 16, p. 40], and questions long have 
the effective 
facilitating insight. Recently, 
Hobbs [8] has expressed some doubt in con 


been raised concerning most 


method of 
nection with the relationship of insight to be 
havior and the usefulness of such a concept 
in psychotherapy. Despite the widespread us- 
age and controversial nature of this concept, 
few attempts have been made to measure in 
sight directly and objectively. Subjective cri- 
teria, in the main, have been used to ascertain 
the role of insight in psychotherapy [3, 18]. 
In the area of diagnostic testing such assess- 
ment usually is made by inference from per- 
sonality test data, an approach which is both 
dificult and hazardous. A new projective test 
by Rapkin [14], however, has shown promise 
in this connection. 


PREVIOUS STUDIES 


It seems appropriate to state the definition 
of insight used in this study since the term 
has been used with various meanings. Insight 
is a perceptual phenomenon indicating the 
correctness of the perception of one’s own be- 
havior and motivation, and the correctness 
of the perception of one’s feelings and atti- 
tudes toward others. Existing perceptions 
which fulfill the criterion of correctness may 
be considered as insightful ones, as well as 
those acquired through the trial-and-error 
manipulation of symbols and objects or 
through a reorganization of the perceptual 


field. 


109 


SERVICE CENTER 


One attempt to measure insight objectively 
was made by Steinmetz [20, 21], who had 
subjects answer the Guilford-Martin Person 
nel Inventory as others would answer it. The 
main emphasis seemed to be on assessing the 
ability to perceive the world as others do, rath 
er than on the understanding of oneself. No 
scoring method or validation material are pre 


sented. 
Similarly, Kimber [10] instructed subjects 


to answer items in the California ‘Test of Per 


sonality, first, for the well-adjusted college 
student and, second, for themselves. Insight 
would seem to be the ability to comprehend 
the meaning behind an item. No definite 


scoring scheme for measuring insight is sug 
gested nor any validation or reliability figures 
Although there were 
between the first and 


scores, it is dificult to determine what these 


presented. significant 


differences second 


discrepancies mean in terms of insight. 


An attempt to measure “‘self-insight’’ was 


made by Gross [6]. His test consisted of a 


number of rating scales, from ; 


+2 to 


which subjects rated themselves. The scales 


on 


were concerned mainly with the various com- 
mon which 
might be considered “‘lie’’ items and to which 
most people would agree. This technique was 
found to have a significant relationship to 
admission of problems and to the Chapin 
Social Insight Scale, and a consistent agree- 
ment with “self-insight” as obtained from 
personality While this approach 
proved promising, the items seem to be too 
general for clinical or research purposes. 


defense mechanisms or items 


sketches. 


Dymond [4, 5] attempted to measure in- 





110 


sight as defined by the “role theory.” Her de- 
finition seems to coincide rather closely with 
what is usually conceived as empathy. The 
extent to which subjects took the role of 
characters mentioned in Thematic Appercep- 
tion Test stories was taken as a measure of in- 
sight. Dymond states that “empathy may be 
the underlying mechanism on which insight 
is based” [4, p. 233]. The present writer feels 
that while the two are related, it is insight 
which underlies empathy. Kimber, and Hoff- 
man and Lehner [9] have shown that one’s 
perception of the well-adjusted and the aver- 
age person is, to a large extent, a projection 
of one’s own feelings and attitudes. There- 
fore, if one does not have adequate self-under- 
standing, it would seem difficult to perceive 
correctly the world from the other person’s 
frame of reference (empathy). This is cor- 
roborated indirectly by studies in client- 
centered psychotherapy which indicate that 
acceptance of others follows acceptance of 
self [17, 22]. Criticisms of Dymond’s tech- 
nique are the subjective nature of assessing 
the degree to which subjects took the roles of 
TAT characters and the lack of reliability 
estimates. 


DESIGN AND SUBJECTS 


The design of this study parallels the de- 
sign suggested by Bray [2] as the most ac- 
ceptable one for validating attitude scales. In 
general, it consisted of assessing the subject’s 
knowledge of his attitudes, feelings, and be- 
havior; placing him in a concrete behavioral 
situation (the criterion); and obtaining the 
relationship between his stated attitudes, feel- 
ings, and behavior and his actual behavior. 
More specifically, the study consisted of con- 
structing two measures of insight, one at a 
behavioral level and one at a feeling and at- 
titudinal level. These tests were to be admin- 
istered prior to, immediately following, and 
four weeks after a series of three 60-minute 
psychotherapeutic inteviews. A correlation of 
the postinterview and follow-up test scores 
was expected to yield an estimate of the test- 
retest reliability of the measures, while a cor- 
relation between the therapist’s judgment of 
the subject’s insight and the postinterview 
test scores should give some idea of the validi- 





DAVID GROSSMAN 


ty of the instruments. The same individual 
served as therapist for all subjects. At the 
completion of the final interview the therapist 
rated each subject for insight on a graphic 
rating scale. ‘che gradations used were: ex- 
cellent, very good, moderate, fair, and poor. 


For the study 20 male college students en- 
rolled in Elementary Psychology were select- 
ed. ‘These subjects were judged to possess on- 
ly a mild degree of maladjustment on the 
basis of their scores on the Guilford-Zimmer- 
man Temperament Survey [7]. They failed 
to answer in the significant direction any of 
the stop items on the Cornell Selectee Index. 
The screening process was thought desirable 
because of the small number of interviews 
planned. The subjects also had indicated 
their interest in participating by having check- 
ed “‘moderately” or “quite” on a rating scale 
ranging from “no,” to “mild,” to “moderate,” 
to “quite interested.” 


CONSTRUCTION OF INVENTORIES 


Self-Inventory Part I. The questionnaire 
method estimate of 
how the individual consciously perceives his 


overt behavior. 


was used to obtain an 
The Guilford-Zimmerman 
Temperament Survey, a personality test con- 
structed from factor analyses of test items, 
was used for this purpose. The insight in- 
ventory based on this test consists of 12 defini- 
tions of the personality traits measured by the 
test. The definitions were derived after an ex- 
amination of the items defining each factor. 
The subjects were asked to rate themselves in 
comparison with the male college population. 
Following each definition appeared a graphic 
rating scale with percentiles printed under 
the scale. For the sake of brevity, only one de- 
finition is presented here. 

The higher you rate yourself the more it indi- 
cates a tendency toward the following: high degree 
of energy, overt action, and general activity; liking 
for speed and being “on the go.” The lower you 
rate yourself the more it indicates a tendency to- 
ward the following: inertness and disinclination for 
motor activity. 

The inventory was scored in the following 
manner. The test scores and the subject’s rat- 
ings of himself were converted into values on 
a scales whose unit was one-fourth of a stand- 











TWO INSIGHT INVENTORIES 


ard score. Since no subject rated himself more 
than one and one-half standard deviations 
from the mean, the tails of the distribution of 
temperament scores were combined with the 
last sigma unit. Thus the scale consisted of 
12 equal units. It was felt that this type of 
scoring would be more accurate than a straight 
comparison of percentile scores. After this 
conversion had been made, the discrepancies 
between the subjects’ ratings and their test 
scores were obtained and summed. It was as- 
sumed that the larger the discrepancy, the less 
insight the subject possessed into his behavior. 
One further precaution was exercised. The 
amount of insight an individual could possess 
was in part limited by his actual position in 
comparison with the college group. For ex- 
ample, the individual whose score fell in the 
middle of the group could have a discrepancy 
of only 6 points in either direction, while an 
individual who was at either end could have 
an 11 point discrepancy. ‘To correct for this 
artifact, the maximum possible discrepancy 
for each subject was used as the denominator 
rather than an absolute figure derived by 
multiplying the number of traits by the maxi- 
mum scale units. To obtain the percentage of 
insight, the sum of the discrepancies was di- 
vided by the maximum possible fluctuation. 

Self-Inventory Part II. The second insight 
inventory was based on the Thematic Apper- 
ception ‘Test inasmuch as this test provides 
information concerning the subject’s feelings 
and attitudes toward himself and others. The 
subjects wrote their own stories, as they were 
intelligent, literate individuals. No formal 
schema was used for scoring, and only the fol- 
lowing information was available: number 
and age of siblings, and whether the parents 
were living, dead, separated, or divorced. To 
offset the subjectivity of TAT scoring, two psy- 
chologists interpreted the stories independ- 
ently. 

Although the TAT consists of thirty 
cards, rarely in clinical practice is the full set 
used. Two approaches were employed in the 
selection of the most appropriate cards. One 
consisted of a survey of the literature, and 
the other of a study conducted by the writer. 
Twenty advanced clinical psychology interns 
and supervising psychologists were asked to 


111 


name two areas of personality that they could 
always be sure of measuring by the TAT and 
to choose two or three cards that would best 
measure these areas. The results of the study 
indicate that attitudes toward parents (chosen 
by 15 of the 20 psychologists), and attitudes 
toward sex (chosen by 12 of the 20) were the 
predominant selections, followed by aggressive 
or hostile feelings and level of aspiration. 
Also, it was found that cards 6BM and 7BM 
were the unanimous choices for measurement 
of attitudes toward parents. Cards 4 and 13 MF 
were selected unanimously by 
named the area of sex, and card 


those who 
ISGF was 
chosen by three of the five people who men 
tioned the 
Cards 1, 2, 
the four people who named the area of aspi 
ration level. 


area of hostility or aggression. 


and 14 were chosen by three of 


Among published research studies, the fol 
lowing proved helpful in selecting the cards: 
Bellak [1], Murray and Stein [13], Loeb 
lowitz-Lennard and Reissman [11] Rapa- 
port [15, p. 422], and Stein [19, pp. 5, 8]. 
On the basis of all the data, the following 
cards chosen 
sequence listed: 1, 


3BM, and 12M. 


were and administered in the 


6BM, 7BM, 4, 13MF, 


The plan was to construct multiple-choice 
items to cover the feelings and attitudes (not 
behavior) expressed by the subjects to the 
TAT cards. The degree of discrepancy be 
tween the manner in which the subject an 
swered the item and the manner in which the 
two psychologists felt he should have answe 
ed it on the basis of the TAT material, indi 
cated the amount of insight the subject pos 
sessed. Because of the idiosyncratic nature of 
each subject’s protocol, no single inventors 
could be constructed for all. A separate in- 
ventory was therefore prepared for each sub 
ject. It was decided that the number of items 
for each inventory should range from twenty 
to thirty, that at least one item should be in 


cluded 


piration level, sex, 


from each area (mother, father, as 


that 
each item would be used for as many subjects 


and therapy), and 
as possible. In general, one-third of the items 
constructed applied to more than one indi- 


vidual. The following are sample items. 








112 


11. When others set goals for me, I am inclined 
to: (a) Rebel against them. (b) Be happy with 
them. (c) Submit, but with dissatisfaction. (d) 
Others never set goals for me. 


15. In order for me to enjoy sexual intercourse, I 
feel: (a) I should be married. (b) It would 
be better to be married, but it is not necessary. 
(c) It does not matter if I am married or not. 
(d) The need for sexual intercourse is no 
problem to me. 


40. In general, for things to turn out well, my 
mother’s affection and approval are: (a) Quite 
important. (b) Of moderate importance. (c) 
Not too important. (d) Unimportant. 


The following method was adopted for the 
keying of choices. One psychologist interpret- 
ed the TAT protocols and keyed the choices 
which he felt would indicate the maximum 
amount of insight. This procedure was re- 
peated by the other psychologist. A compari- 
son was made, and if disagreements could not 
be resolved easily by a review of the protocol, 
the item was discarded. To establish weights 
for the choices, the following system was used. 
For choices which were in effect four- 
point rating scales, the correct choice was 
given a weight of 0, indicating no disagree- 
ment and a maximum of insight. If the cor- 
rect choice was at one end of the scale, as the 
subject deviated from it the weighting in- 
creased by 1. If the correct choice was not at 
either end of the scale, those choices on either 
side of the correct choice were given a weight 
of 1, and the remaining choice a weight of 2. 
For those items which were in effect a three- 
point rating scale with an additional choice 
indicating a denial of the keyed choice, the 
same system was used, except that the denial 
choice was given the same weight as the most 
marked deviation. A large majority of items 
was handled by these procedures. For items 
which included different feelings as choices, 
the choices were given a weight of 1 unless 
one of the choices was a denial of the keyed 
choice, in which event it was given a weight 
of 2. The rationale for the weighting of the 
choices was to make the instrument as sen- 
sitive as possible. 


Since each inventory consisted of a differ- 
1The author wishes to express his appreciation to 


Joel J. Steiner, Los Angeles, California, for his val- 
uable assistance. 


DAVID GROSSMAN 


ent number of items, the total weighted scores 
were converted into percentage scores. To 
accomplish this the maximum amount of dis- 
agreement possible was obtained for each inven- 
tory. Then the discrepancies between the sub- 
ject’s answers and the psychologists’ key were 
obtained and summed. The sum was subtracted 
from the maximum possible disagreement in 
order to give the amount of agreement, and 
then divided by the maximum possible dis- 
agreement to give the percentage of the sub- 
ject’s insight. 


STATISTICAL RESULTS 

The Pearson product-moment correlation 
between the postinteview and follow-up test 
scores for Self-Inventory Part I yielded a 
reliability estimate of .757. For Self-Inven- 
tory Part II the estimate was .850. It can be 
seen, then, that the reliability of these inven- 
tories is similar to that of most personality 
inventories. Since these are test-retest reli- 
abilities, they probably represent an under- 
estimate of the true reliability. 

The Pearson product-moment coefficient of 
correlation between the postinterview test scores 
for Self-Inventory Part I and the therapist’s 
ratings was .426 + .188. For Part II the corre- 
lation was .474 + .178. For an N of 20 with 
18 degrees of freedom an r of .444 is required 
for significance at the 5 per cent level. If the 
therapist's judgment can be accepted as an 
adequate criterion of insight, then Part II 
showed a significant and moderate relation- 
ship with this criterion, and Part I showed a 
relationship not significant at the postulated 
level. Another suggestion of the validity of 
Part II is seen in the difference (approaching 
the 5 per cent level) between the preinter- 
view scores and the postinterview scores. Part 
I showed no significant difference. Part II 
therefore seems more sensitive to changes 
which took place during the interviews as a 
result of the therapist’s deliberate attempt to 
facilitate insight. 


DISCUSSION 


Although Part II shows promise as a reli- 
able and valid measure of insight, it is not 
without limitations. For one thing, it is dif- 
ficult to determine from the TAT, except 


TWO INSIGHT INVENTORIES 


within certain limits, how unaware a subject 
is of his feelings and attitudes. This means 
that undoubtedly some items were more dif- 
ficult than others, but the extent of these dif- 
ferences was impossible to ascertain. It was 
assumed that the difficulty levels were ran- 
domly distributed. Secondly, there were limi- 
tations to the weighting system. The pro- 
cedure used assumes a linear regression, while 
it is possible that a curvilinear one exists, that 
is, the steps between the choices may not be 
equal. (A “rights only” score proved to be 
less reliable and valid.) Also, the procedure 
is time consuming and laborious for practical 
clinical use, although it may seem worth while 
for research purposes. A battery of two or 
three projective tests is suggested as a basis 
for constructing the inventory, making it pos- 
sible to increase the number of items for each 
subject, and perhaps to assess more accurately 
the degree of unawareness. 


An attempt was made to ascertain the rea- 
son for the lower reliability of Part 1. In an 
informal discussion with each subject upon 
completion of the testing, it became evident 
that the descriptions of the traits were inter- 
preted in various ways. The higher validity 
of Part II might be explained by the close 
relationship between the material discussed 
during the interviews and the TAT material. 
Despite lower reliability and validity, Part I 
would seem to hold more promise as a clinical 
tool for assessing insight. It is possible that 
the reliability could be increased if the traits 
were better defined and behavioral examples 
stated. Since this type of insight inven- 
tory is standardized, norms could easily be com- 
piled for various clinical groups. A suggested 
procedure for increasing the validity of both in- 
ventories, if they are to be used to assess im- 
provement due to psychotherapy, hospitali- 
zation, electric-shock treatment, etc., is to read- 
minister the personality tests after therapy. 
This would enable the experimenter to re-key 
the items more accurately and in accordance 
with the new feelings and behavior of the sub- 
ject. Also, insight inventories may be useful in 
investigating the most effective method for 
facilitating insight and in studying the relation 
between insight and overt behavior, empathy, 
and adjustment, and other research problems. 


113 


SUMMARY 


A technique for measuring insight was de- 
scribed, as well as the construction of two trial 
inventories. A check was made on the validity 
and reliability of each. Part I proved to be less 
reliable and less valid (when compared to ther- 
apist’s ratings) than Part I1, but seems more 
practical in that it is easier to construct and 
score. Part II showed a significant and mod- 
erate correlation with the criterion and holds 
promise as a research instrument. ‘The limita- 
tions of the and 
suggestions given for improving the technique. 


inventories were discussed 


Received June 2, 1950. 


REFERENCE® 

1. Betvak, L. The concept of projection: An ex- 
perimental investigation and study of the con- 
cept. Psychiatry, 1944, 7, 353-370. 

2. Bray, D. W. The prediction of behavior from 
two attitude scales. J. Psychol, 
1950, 45, 64-84. 

3. Curran, C. A. Personality factors in counsel- 
ing. New York: Grune and Stratton, 1945. 
RosaLinp F. A preliminary investi- 
gation of the relation of insight and empathy. 

J. consult. Psychol., 1948, 12, 228-233. 

5. Dymonp, RosAtinp F. A scale for the measure- 
ment of empathic ability. J. 
1949, 13, 127-133. 

6. Gross, L. The construction and partial stand- 
ardization of a scale for measuring self-insight. 
J. soc. Psychol., 1948, 28, 219-236. 

7. GutLForp, J. P., AnD ZimMERMAN, W. S. Tem- 
perament survey. Beverly Hills, Calif.: Sheri- 
dan Supply Co., 1948. 


abnorm. soc. 


4. DyYMonp, 


consult. Psychol., 


8. Hoses, N. Insight in short-term psychotherapy. 
Amer. Psychologist, 1949, 4, 273. 

9. HorrMan, R. L., ann Lenner, G. F. J. Self 
ratings on personal adjustment and their re- 
lationship to scores for self and projected av- 
erage scores on a personality test. Paper read at 
the Western Psychol. Assn., 
April, 1950. 

10. Kimber, M. The insight of college students into 
the items on a personality test. Educ. psychol. 
Measmt., 1947, 7, 411-420. 

11. LogsLowrrz-Lennarp, H., AND RelssMAn, F. 
Recall in the TAT: An experimental investi- 
gation into the meaning of recall of phantasy 
with reference to personality diagnosis. J. Per- 
sonality, 1945, 14, 41-46. 

12, Mastow, A. H., AND MrTTe_MAnn, B. Prin- 
ciples of abnormal psychology. New York: 
Harper, 1941. 

13. Murray, H. A., ano Stein, M. I. Note on the 
selection of combat officers. Psychosom. Med., 
1943, 5, 386-391. 


Santa Barbara, 








114 


14. 


17. 


DAVID GROSSMAN 


Rapkin, M. Preliminary validation study of the 
projective-motor test. Unpublished Doctor’s the- 
sis, Univ. of Southern California, 1949. 


Rapaport, D. Diagnostic psychological testing. 
Chicago: Year Book Publishers, Inc., 1946. Vol. 
II. 

Rocers, C. R. Counseling and psychotherapy. 
Boston: Houghton Mifflin, 1942. 

SHeeReR, EvizAsetTH T. An analysis of the re- 
lationship between acceptance of and respect 
for self and acceptance of and respect for 


others in ten counseling cases. J. consult. Psy- 
chol., 1949, 13, 168-180. 


18. 


19, 


20. 


i) 
N 


Snyper, W. U. An investigation of the nature 
of non-directive psychotherapy. J. gen. Psychol., 
1945, 33, 193-224. 

Stein, M. I, The thematic apperception test. 


Cambridge, Mass: Addison-Wesley Press, 1948. 


Steinmetz, H. C, 
Measuring psychological understanding. J. clin. 
Psychol., 1945, 1, 331-335. 

Steinmetz, H. C. Selecting personnel workers. 
Educ. psychol. Measmt., 1947, 7, 37-43. 
Stock, Dororuy. An investigation into the in- 
terrelations between the self concept and feel- 


Directive psychotherapy: V. 


ings directed toward other persons and groups. 
J. consult. Psychol., 1949, 13, 154-156. 











DIFFERENCES IN PREDICTION BASED ON HEARING 
VERSUS READING VERBATIM CLINICAL 
INTERVIEWS’ 


JOSEPH LUFT 


STANFORD 


Hk human voice enters into many clini- 

cal judgments, but its influence seldom 

has been measured experimentally. J. E. 
Bell comments : 


It is surprising that our most important means of 
communication, the voice, which in everyday life 
enters 80 strongly into our judgments of others, 
should have been utilized so infrequently in a con- 
trolled for the determination of 


personality characteristics [4, p. 421]. 


scientific manner 

Pennington and Berg also note the lack of 
research on the auditory component of inter- 
views: 

As yet, however, far too little attention has been 
paid by psychologists to the intimate relation between 
verbal expression and personality, despite the fact 
that the astute clinician makes daily use of word 
choices, manners of expression, and similar clues in 
the study and evaluation of his clients [11, p. 157]. 

With the improvement and rapid increase of 
the use of voice recorders in psycholoical clinics 
and laboratories, clinicians are confronted with 
new opportunities for psychological inquiry 
[12]. Every clinician regardless of specializa- 
tion relies to some extent voice cues | 


on in 


forming impressions of personality. Psychia- 
trists, social workers, and clinical psycholo- 
gists deal with the patient in a face-to-face 
situation whether it be intake interviewing, 
psychotherapy, or diagnostic testing. Records 
of these contacts with the patient are some- 
times verbatim, as in testing, while therapy 
and intake records are usually summarized 
accounts of facts and impressions. 

The value of verbatim typescripts has been 

1Published with permission of the Chief Medical 
Director, Department of Medicine and Surgery, Vet- 
erans Administration, who assumes no responsibility 


for the opinions expressed or conclusions drawn by 
the author. 


UNIVERSITY 


115 


Allport [1], Covner [8], and 


others. But the significance of a voiced record 


dis« ussed by 
as compared to a transcribed account of that 
record has apparently not been investigated in 
setting. Sanford [14] and Bell 


[4] have summarized the research in the field 


in the clini 
of voice and personality. Most of the studies 
were concerned either with the value of hear- 
ing versus reading material from the learning 
point of view [7], or with atomistic analyses of 
speech. Typically such analyses involved the 
counting of parts of speech, description and 
style of speech, the effectiveness of speech, and 
the analysis of speech disorders. Cantril and 
Allport [5] describe experiments dealing with 
the matching of voice with other personality 
characteristics, the ability of the blind to judge 
personality from voice, and a study of listeners’ 
preferences for male and female voices. Several 
studies describe the voice characteristics of var- 
ious types of clinical patients. The matching of 
voices with personality sketches and evaluation 
of IQ from voice records have also been report- 
ed [14]. A few classification systems of voice 
variables may be found in Bell’s summary [4]. 

The purpose of this experiment was to ex- 
amine the relative effectiveness of listening to a 
clinical interview as compared to reading the 
same interview. The accuracy of impressions 
of the interviewee’s personality gained through 
both methods of observation was measured in 
terms of the ability to predict responses to clini- 
cal tests. 


PROCEDURE 
Part I. Fourteen undergraduate students 


who volunteered for the experiment were divid- 
ed at random into two groups. One group of 





116 JOSEPH LUFT 


students listened to a one-hour interview with 
patient WBT while the other group read a 
verbatim typescript of the same interview. 
Then both groups of students attempted to 
predict how the patient (or interviewee) would 
respond to two clinical tests. 

The procedure was then repeated except 
that a one-hour interview with another patient, 
JRR, was used and the groups of student- 
judges were reversed. Those who listened to 
the first case read the second, while those who 
read the first case listened to the second. Again 
the students attempted to predict how this 
second subject, JRR, would respond to the 
clinical test items. 

The student’s predictions were in all cases 
compared to what the subjects actually did on 
the tests. Each judge then had a score which 
consisted of the total number of responses cor- 
rectly predicted for each test. This part of the 
experiment made possible two independent 
comparisons between the predictions of stu- 
dent-listeners against student-readers. 


Part II. As part of another experiment, 66 
clinicians (20 psychiatrists, 18 social workers, 
and 28 clinical psychologists) were also asked 
to make predictions in the same way and on the 
same cases (WBT and JRR) after reading the 
transcribed verbatim interviews. Unlike the 
student-judges above, the clinicians were not 
permitted to listen to the wire recordings of the 
interviews. Since the procedures for Part I and 
Part II were identical except for the differences 
in the way the interviews were perceived, we 
could again compare the efficacy of reading 
versus listening by comparing this time the 
number of correct predictions for the student- 
listeners as against the clinician-readers. In 
making these group comparisons, special atten- 
tion was given to the number of judges in each 
group who were able to predict convincingly, 
i.e, who were able to achieve prediction scores 
which are significantly different from chance 
at least at the five per cent level. 

The Interviews. The subjects WBT and 
JRR were an asthma and peptic ulcer case re- 
spectively, who were referred by their ward 
physicians to the psychological clinic. The in- 
terviews, which were held as part of the psy- 
chological evaluation procedures, were conduct- 
ed in a permissive and flexible manner so that 
the patients could express themselves freely 


and spontaneously. The wire recordings were 
made with their understanding and permission. 
Names of patients and similar identification 
were, of course, changed in the verbatim type- 
scripts, but this in no way affected the content 
of the manuscripts. 


PREDICTING TEST RESPONSES AS 
A VALIDATION TECHNIQUE 

The technique we have employed to measure 
the validity of observers’ impression of the pa- 
tients’ personalities has been called “controlled 
cross-prediction” by Allport [1]. This tech- 
nique has been used by Cartwright and French 
[6], Hanks [9], Kelly, Terman, and Miles 
[10], Steinmetz [15], and others in a variety 
of personality studies. Basically, the rationale 
assumes that there is a direct relationship be- 
tween understanding an individual and the 
ability to predict how that individual would re- 
spond to different kinds of verbal stimuli. Such 
prediction may be ordered to the more general 
category of social anticipation or expectancy, 
which is of such crucial importance in social in- 
teraction. Our daily lives as individuals and as 
members of groups depend on our ability to 
anticipate to some degree how other persons 
will behave. Our inability to predict the be- 
havior of others indicates a failure to under- 
stand and tends to result in a breakdown of 
communication and interaction. 


THE OBJECTIVE PREDICTION INSTRUMENT 


The impressions gained through reading or 
listening to an interview form the basis for 
predicting how the interviewee will respond to 
an objective-type test and a projective-type test. 
The objective test is a 60-item yes-no-type of 
questionnaire adapted from Bell’s Adjustment 
Inventory [3]. The items were screened ex- 
perimentally by asking a group of 38 persons to 
answer the 160 Bell items as they thought the 
average person would answer them. Those 
items which tended to be universally answered 
in a particular way were omitted. As a hy- 
pothetical example, an item such as “Do you 
try to excel in your chosen field of work?” 
would be omitted because in our society the 
response would be easy to predict. However, an 
item such as “Do you find it easy to ask others 
for help?” was found empirically to yield about 
an equal number of yes and no responses. 

The judge, in forecasting the response to a 














PREDICTION FROM HEARING VERSUS READING 


question, must take into consideration the pos- 
sibility that the subject may cover up or exag- 
gerate or falsify his “true” responses. In other 
words, he must be cognizant of the nature of 
the subject's ego-needs and his defenses, as well 
as the degree of insight which the subject pos- 
sesses. But it is just these influences which the 
observer is attempting to discover in studying 
a subject. It therefore follows that the better 
the understanding, the larger will be the num- 
ber of items which the judge correctly pre- 
dicts. There is also experimental evidence in 
the literature confirming this relationship [see 
especially 6, 9, and 15}. 

THE PROJECTIVI 


PREDICTION INSTRUMENT 


The second test used for prediction purposes 
is a 35-item multiple-choice projective tech- 
nique based on the OSS form of the sentence- 
completion test [17]. Research evidence indi- 
cates that this device tends to tap deeper aspects 
17). The 


harmless-looking stimulus together with the 


of the subject’s personality [16, 


pressure for speed tends to evoke genotypically 
significant material without the usual defenses 
interposed. For purposes of this study, ability 
to predict responses to objective questions re- 
quires understanding at a more overt or phe- 
notypic level, while ability to predict projec- 
tive 


responses requires understanding at a 


more covert or genotypic level. 

Three patients in addition to the two experi- 
mental subjects were given the OSS form in 
the usual way. Then, the items which yielded 
the most responses were selected. 
Thirty-five of the original 100 OSS items were 
thus included. Each of the five different re- 
sponses per item was presented in a multiple- 
choice arrangement similar to the example 
herewith presented: 


varied 


He thinks of himself as— 
a. a big shot 
b. not much 
c. quite a ladies’ man 
d. a flyer 
e. a good follower 


The judge was required to identify the sub- 
ject’s projection from among the others on the 
basis of either hearing or reading an interview 
with that subject. Since there are five alter- 
nates for each of the 35 items, it was possible 


INTERVIEWS 117 
to calculate that a score of 12 or more correctly 
identified responses was necessary in order to 
predict significantly different from chance at 
the 5 per cent level. 


RESULTS 
Each student-judge had two scores, one for 
the number of items correctly predicted on the 
objective test and one for the projective test. 
‘Table 1 compares the mean scores on the ob- 
jective test for student-listeners against stu- 
dent-readers. ‘The comparisons cover both in 
terviews, and Fisher’s ¢-test for the significance 
of the difference between means is applied. 
Table 2 presents the mean projective test 
scores of student-listeners student 
readers with the application of Fisher's ¢-test 
to both experimental cases. 


versus 


‘Table 3 shows the percentages of each group 


of judges who were able to identify a signifi- 
cant number of responses on the projective test. 
Fach category of clinician-readers is represent 
ed, as well as the student- 
who correctly 


identified a significant number of the inter- 


percentages of 


readers and student-listeners 


viewees responses to the sentence completion 
test. 


TABLE 1 
AVERAGE 
Peepicreo by STUDENT-LISTENERS AND 
BY STUDENT-ReEADERS ON Objective Test 


COMPARISON OF THE NumMBeR OF RESPONSES 


CORRECTLY 


Case WBT Case JRR 
Reading Hearing Reading Hearing 
Mean 36.86 37.14 36.14 34.86 
S.D 4.97 5.64 4.3 4.22 
S.D.,, 2.03 2.30 1.77 72 
D, 28 1.28 
S.D.4,, 3.07 2.46 
t 09 2 
TABLE 2 


COMPARISON OF THE AVERAGE NUMBER OF RESPONSES 
STUDENT-LISTENERS 


BY STUDENT-READERS ON Projective Trst 


CORRECTLY PREDICTED BY AND 


Case WBT Case JRR 
Reading Hearing Reading Hearing 
Mean 7.00 8.86 8.43 9.29 
S.D. 0.75 2.41 1.76 2.91 
S.D..y 0.31 0.98 0.72 1.19 
D,, 1.86 86 
) 1.03 1.39 
t 1.81 62 





118 JOSEPH LUFT 


TABLE 3 
PERCENTAGES OF JuDGEs IN Eacn CaTzGory WHO 
Correct_y IDENTIFIED A SIGNIFICANT* NUMBER OF 
RESPONSES ON THE Projective Test 





‘ ‘ 
: ? , ¢ 
sf te ae Be fe 
oe - A . 
te et ee ee 
Ba Be be 2 §3 
A &e fe oe Oe 
Total no. of judges 7 7 20 18 28 
Per cent achieving 
significant scores 28.0 0.0 15.0 5.5 10.7 


for case WBT 


Per cent achieving 
significant scores 28.0 0.0 10.0 16.7 17.8 
for case JRR 


*Twelve correctly identified responses on the 35-item 
test constitutes a score which is reliably different from 
chance at the 5 per cent level. 


**The seven students who listened to case WBT were 
not the same students who listened to case JRR; see 
procedure. 


DISCUSSION 


The student-readers did as well as the stu- 
dent-listeners in predicting the patients’ re- 
sponses to the objective yes-no-type of ques- 
tionnaire (Table 1). However, on the pro- 
jective test, the differences are consider- 
ably larger and consistently in favor of the 
listeners. In one case (WBT), the chances 
are 85 in 100 that the differences were not due 
to chance. The superiority of the listeners is 
shown more dramatically when we compare the 
number of judges in each group who able to 
identify a significant number of projections. As 
mentioned above, a judge who correctly 
identifies 12 of the subject’s 35 responses is do- 
ing better than chance at the 5 per cent level 
of significance. It would be reasonable to as- 
sume that anyone obtaining such a score is not 
guessing. 


Two listeners of seven were able to identify 
12 or more projective responses for case WBT, 
but not a single one of the student-readers 
could do this. When the two student groups 
were reversed so that the listeners now became 
readers and the original group of readers was 
permitted to listen to the second case, we still 
obtained the same results. That is, two of seven 
(or 28 per cent) of the new listeners were able 
to identify accurately a significant number of 
the interviewee’s projections but not a single 


one of the new readers could achieve this. The 
results from the use of recorded interviews 
with two different cases, WBT and JRR, and 
the reversal of the listeners and readers to 
cancel out individual differences, tend to sup- 
port the contention that important qualitative 
differences exist between these two methods of 
observing an interview. 

It was fortunate that we had the results of 
another experiment with clinicians in which the 
same two recorded interviews were used. As 
mentioned in Part II of the procedure, a num- 
ber of psychiatrists, social workers, and clini- 
cal psychologists also read the interviews and 
made predictions just as our student-readers 
did. The most interesting results for purposes 
of this paper are presented in summary form in 
Table 3. The table shows that a number of 
clinician-readers were able to identify signif- 
icant number of projective responses in both 
cases, whereas none of the student-readers 
could achieve this. Apparently, the experience 
and training in psychodynamics made it pos- 
sible for some of the clinicians to get at the 
more covert manifestations of the interviewee’s 
personality, which the students could do only 
after they had heard the interview in action. 


When we compare the student-listeners with 
the clinical specialists who read the interviews, 
the advantage swings over to the students. 
Again, the results of both cases independently 
support this tendency. Without applying criti- 
cal statistical tests but by noting the trends of 
the data from case to case, it becomes inicreas- 
ingly evident that there is a distinct advantage 
gained by listening, at least as far as identifying 
projective responses is concerned. 

Even though the listener is unable to stop and 
skim back over the material as a reader can, 
still the listener has the additional source of 
personality cues in the subject’s voice. Is the 
voice weak, thin, and hesitant, or is it full, 
strong, and confident? What about the rate 
of speech and the subtle indicators of tension ? 
Are there peculiar variations as certain materi- 
al is touched upon? These and a variety of other 
variables such as resonance, loudness, breathi- 
ness, and pitch are combined in a spontaneous 
manner habitual to the subject as he attempts 
to explain to the interviewer what he thinks 
and feels. The listener, of course, also hears 














PREDICTION FROM HEARING VERSUS READING INTERVIEWS 


the interviewer and the interaction between 
interviewer and subject. ‘The timing element 
and periods of silence are probably other im- 
portant factors which are not available to the 
reader. Important voice cues appear to be pre- 
sent even though electric recording techniques 
tend to muffle or even exclude entirely certain 
of the more sensitive nuances of voice vari- 
ations [4]. 

Speech clinicians have long been cognizant 
of the significant relationship between voice 
and personality. That this relationship is un- 
consciously taken for granted by most people is 
pointed out by one speech authority : “Whether 
we like it or not and whether we are aware of 
it or not, indissoluble ties have been established 
in the mind of the average individual between 
certain characteristics of voice and correspond- 
ing types and traits of personality” [2, p. 6]. 


Psychoanalytically oriented psychologists ap- 
pear to be especially interested in the way in 
which something is expressed. “ . . . Hearing is 
believing,” says Reik [13] in his recent book 
in which he discusses the expression of feeling 
during the analytic hour. ‘““The most personal 
factor of these emotions, the intimacy of the 
inner experience is, it is true, not sayable, but 
its reflex will communicate itself like a song 
without words and express emotions that the 
listener in his turn will sense” [13, p. 34]. 


SUMMARY 


The purpose of this study was to examine 
the relative effectiveness of listening to a clini- 
cal interview as compared to reading the same 
interview. By use of a prediction technique to 
measure judgment, it was found that readers 
and listeners did equally well in predicting re- 
sponses to objective clinical test items. Identify- 
ing projective test responses, however, seemed 
to be easier for the listeners. T'wenty-eight per 
cent of the student-listeners were able to exceed 
chance scores while none of the student-readers 
could do better than chance on the projective 
test. Exactly the same results obtained even 
though a second case interview was used and 
the groups of listeners and readers were revers- 
ed. When predictions were based on impres- 
sions gained only through reading the inter- 
views, groups of psychiatrists, social workers, 
and clinical psychologists were superior to 


119 


groups of undergraduate students on the pro- 
jective test. However, students who /istened to 
the interviews did consistently better in identi- 
flying projective responses than the clinical 
specialists who read the same cases. It is sug- 
gested that the voice in spontaneous speech 
tends to externalize significant underlying 
aspects of the personality which may not be 
apparent in the content of the speech. 


Received July 24, 1950. 


REFERENCES 


1. Atrport, G. W. The use of personal documents 
in psychological science. Soc. Sci. Res. Coun. 
Bull., 1942, No. 49. 

2. Awnperson, V. A. Training the speaking voice. 
New York: Oxford Univ. Press, 1942. 

3. Beir, H. M. The theory and practice of person- 
al counseling; with special reference to the ad- 
justment inventory. Stanford, Calif.: Stanford 
Univ. Press, 1939. 

4. Beir, J. E. Projective techniques. New York: 
Longmans, Green, 1948. 

5. CAntTrit, H., AND ALLport, G. W. The psychol- 
ogy of radio. New York: Harper, 1935. 

6. CARTWRIGHT, D., AND Frencu, J., Jr. The reli- 

ability of life history studies. Character & Pers., 

1939, 2, 110-119. 

Carver, M. E. Listening versus reading. In 

H. Cantril and G. W. Allport, The psychol 

ogy of radio. New York: Harper., 1935 

8. Covner, B. J. Use of phonographic recordings 


“I 


in counseling practice and research. J. consult 
Psychol., 1942, 6, 105-113 

9. Hawnxs, L. C. Prediction from case 
Arch. Psychol., N. Y., 1936, No. 207. 

10. Kettry, E. L., Mives, C. C., anp Terman, L. M. 
Ability to influence one’s score on a typical pa- 
per and pencil test of personality. Character & 
Pers., 1936, 3, 206-215 

11. Penwnincron, L. A., ANp Bere, I. A. (Eds.) An 
introduction to clinical psychology. New York: 
Ronald Press, 1948. 

12. Rarmy, V. C. 


sound 


material, 


Functional specifications for a 

recorder for the psychol 
1mer. Psychologist, 1948, 3, 513-518. 

13. Reik, Tueopvor. Listening with the third ear. 
New York: Farrar, Straus, 1949. 

14. SANForD, F. H. Speech and personality. In L. A. 
Pennington and I. A. Berg (Eds.), An in- 
troduction to clinical psychology. New York: 
Ronald Press, 1948. Pp. 157-177. 

15. Steinmetz, H. C. Measuring psychological un- 
derstanding. J. clin. Psychol., 1945, 4, 331-335 

16. Symonps, P. 


eric al « linic. 


M. The sentence-completion test 
as a projective technique. J. abnorm. soc. &:" 
chol., 1947, 42, 320-329. 

17. The assesment of men. OSS Assessment Staff. 
New York: Rinehart, 1948. 








A FORERUNNER OF RORSCHACH 


MILTON $ 


.GURVITZ 


THE HILLSIDE HOSPITAI 


ORSCHACH'’S work as summarized in 
Psychodiagnostics is so all-inclusive and 
thorough for a first publication on an 

unknown topic, that much of what we have 
done since is only elaboration and refinement. 
It is even more remarkable that Rorschach’s 
work seems to have been accomplished in a psy- 
chological and historical vacuum, since the pre- 
vious historical reviews of the pre-Rorschach 
use of inkblots have discovered nothing signifi- 
cant in the use of inkblots for personality or 
diagnostic evaluation, and, more importantly, 
have not shown that Rorschach was aware of 
such previous investigations. 


Both Krugman [2] and Tulchin [4] in 
their comprehensive reviews have, however, 
neglected the one study which was known to 
Rorschach, since he makes direct reference to it 
[2, p. 102]. The study by Hens [1] summar- 
ized the results of an experiment carried on in 
Switzerland in 1916 with 1000 school chil- 
dren, 100 normal adults, and 100 patients of 
varied diagnoses. Most of the patients were 
tested at the Burgholzli. The experiment was 
carried on under the supervision of Bleuler 
and consisted in getting responses to the eight 
black and white inkblots reproduced at the 
end of the monograph [1]. 


Hens’s study, although extensive and com- 
petent, was not graced by the clinical insight 
and creativeness that enabled Rorschach to 
produce the instrument and method which we 
know today. Nevertheless, it anticipated in em- 
bryonic form a surprising amount of material 
now used in Rorschach theory and diagnosis. 


Hen’s concept of projection was: 


We never, of course, perceive as much with our 
senses as with our psyche, i.e., we add even to our 
every day perceptions something from the treasure 
of our previous experiences. We interpret the incum- 


120 


plete perception-complex, influenced by our exper- 
iences, in the direction of reality. This function we 
do not call phantasy or imagination, and as far as 
we know, until now both functions are connected 
only to the degree that imaginative individuals pos- 
sess greater possibility and greater tendency towards 
more far-reaching interpretations that deviate more 
widely from reality [1, p. 7] 

The concept of scoring as evolved by 
Rorschach never occurred to Hens, and inter- 
pretations were made from content, although 
location and relation of detail to whole were 
taken into consideration. 

In analyzing the records of his thousand 
children (including some who were feeble- 
minded) Hens was struck to a great extent by 
the effect of reality factors and immediate as- 
sociations. Responses were influenced by classes 
taken just before the blots were administered. 
Children with a high socioeconomic status or 
with high intelligence were noted as giving 
more responses and responses with richer 
phantasy. The personality of the teacher ap- 
peared to influence greatly the imaginal pro- 
ductivity of the students. Sex differences were 
minimal. 

In adults, similar factors were operative, but 
mood played an important role in determining 
the number of responses and the richness of the 
phantasy. Euphoria heightened phantasy, de- 
pression inhibited it. 

Hens quotes many examples of pathological, 
autistic thinking obtained from schizophrenic 
records, but he did not believe that the diag- 
nosis could be made from the test records 
alone: “If one cannot always diagnose the dis- 
order or its type through examining the phanta- 
sy, at least one can often recognize the presence 
of disease early in the examination” [1, p. 29]. 

The depressives were characterized by an in- 
hibition of thought, an inability to shift from 
one concept to another and retardation in 

















A FORERUNNER OF RORSCHACH 121 


thinking. Direct depressive references in the 
content are cited. Epileptics were characterized 
by retardation of thought plus extreme dif- 
fuseness and circumstantiality in those phanta- 
sies which were presented. Perseveration was 
also frequent. 

Perseveration, however, seemed most char- 
acteristic of the organic and was most diag- 
nostic when associated with a constricted rec- 
ord. Content was utilized to show the pro- 
jection of a feeling of disturbed body image 
onto human figures in the blots, although no 
real concept of M was formulated. 

Oligophrenics were characterized by a pov- 
erty of associations and their emphasis on the 
“close at hand and the commonplace” [1, p. 
37]. Except in cases of pseudologia phantastica, 
Hens considered that the phantasy productions 
of neurotics proved of little diagnostic value, 
although offering a reliable clue to their com- 
plexes. Alcoholics and psychopaths did not ap- 
pear to be distinguishable through the inkblots. 

Patients in general were characterized by 
an inability to enter into the situation as a 
matter of “sport’’ and took the test too serious- 
ly, with some schizophrenics overreacting to 
their own peculiar phantasies. 

The production of phantasies was considered 
a fluctuating ability with experimental evidence 
to show that it is at its maximum in the morn- 
ing, and at a minimum in the evening and after 
food intake. Alcohol appeared to inhibit phan- 
tasy. 

Finally, the author predicted that colored 
cards would lead to greater understandings, 
especially in the realm of feeling and mood. 

The study closes with the following re- 
markable prophecy: 

And finally, it is possible, that future phantasy tests 
with mental patients might accelerate the confirma- 


tion of a difficult diagnosis and become in other 
cases an important diagnostic aid. 


Surely, the near future will bring evidence of it 
[1, p. 64]. 

Although it took Rorschach’s genius and 
clinical experience to complete the prediction, 


it would seem a reasonable inference that 
Hens’s study must have had more of an influ- 
ence on Rorschach than Rorschach has perhaps 
acknowledged [2, p. 102]. Hens certainly 
pointed out the unprofitable areas and, indeed, 
without the benefit of Rorschach’s scoring 
anticipated many of the diagnostic factors. 
After Hens, it was obvious that more radical 
methods of analyzing protocols were necessary. 

Aside from the abilities of the two men, 
which cannot be assessed here, there is an im- 
mediate, striking difference in their clinical ex- 
perience. Hens was a senior medical student 
with only limited professional experience with 
patients; Rorschach had more than ten inten- 
sive years in the practice of psychiatry. While 
Hens was limited to the material from the ink- 
blots, Rorschach had the benefit of an intimate 
knowledge of his patients. ‘The conclusions are 
inescapable. Hens’s abortive, embryonic work 
has been all but forgotten; Rorschach’s work 
has won the position of our most widely used 
and comprehensive diagnostic and personality 
evaluator. 


Only a biographical study of Rorschach 
could demonstrate the exact influence of Hens, 
but it is possible to say that Psychodiagnostics 
did not spring full-blown, as it were, from the 
brain of Rorschach. There were antecedents, 
and it would not be surprising to discover that 
Hens and Rorschach received their inspiration 
from Bleuler, the friend and teacher of both. 


Received August 4, 1950. 


REFERENCES 


1. Hens, S. Phantasiepriifung mit formlosen Kleck- 
sen bei Schulkindern, Erwachsenen und Geistes- 
kranken. (Testing the imagination of school chil- 
dren, adults and mental patients by means of 
formless blots.) Zurich: Speidel and Wurzel, 
1917. 

2. Rorscuacu, H. Psychodiagnostics. (4th Ed.) 
Berne: Hans Huber, 1942. 

3. KrucMaAn, M. Out of the inkwell. Rorschach 
Res. Exch., 1940, 4, 91-110. 

4. Tuvcuin, S. H. The pre-Rorschach use of ink- 
blots. Rorschach Res. Exch., 1940, 4, 1-7. 








COLOR AND THE VALIDITY OF THE RORSCHACH 


8-9-10 PER CENT" 
JANET A. PERLMAN? 


NE empirically derived assumption 
concerning the indirect effect of color 
as it functions in the Rorschach test 

implies that color, regardless of other effects 
it might exert, affects productivity. Klopfer 
[1] suggested the use of the 8-9-10 per cent 
as an indicator of the effect of color on pro- 
ductivity. He estimated that, because the last 
three cards facilitate large detail responses, 
the expected per cent is 40. Underproduction, 
under 30 per cent, is considered a function of 
either lack of stimulation by color or the dis- 
turbing effect of color. Overproduction, sus- 
pected when between 40 and 50 per cent and 
definite when over 50 per cent, is assumed to 
be a function of the stimulating effect of 
color. Sappenfield and Buker [2], using 
group Rorschach technique, attempted to test 
the validity of the 8-9-10 per cent as an indi- 
cator of responsiveness to color. They found 
no significant differences in distribution, mean, 
and variability of the 8-9-10 per cent between 
the results obtained with the Harrower-Erick- 
son Kodaslides and an achromatic series of 
slides. 

The purpose of the present study was to 
test, with individual administration, the as- 
sumption that productivity on the colored 
Rorschach cards, as measured by the 8-9-10 
per cent, is a function of color. If individual 
differences in productivity are strongly related 
to color, then in the absence of color there 
should theoretically occur no values at 30 per 


1Part of a more comprehensive study on color in 
the Rorschach test made in the Department of Psy- 
chology. Cornell University. Grateful acknowledg- 
ment is made to Mr. A. Smith, Cornell University, 
who prepared the achromatic series of blots, and to 
the professional staff of the Syracuse Psychopathic 
Hospital for generous cooperation which made it 
possible to include a sample of patients in the study. 


2Deceased, November 16, 1950. 


122 


cent or below, few between 40 and 50 per 
cent, and none above 50 per cent. When color 
is added to the achromatic stimulus, there 
should be a significantly greater variability in 
the group than when color is not present. 
The this 
first, an examination of the results obtained 
with an achromatic series of Rorschach blots 


testing of hypothesis requires, 


and then an evaluation of the effect of adding 
color to the stimulus material. 


MATERIALS AND METHODS 

Materials. One set of standard Rorschach 
cards and a photographically reproduced ach- 
romatic set were used. Special efforts were 
exerted to maintain the brightness relation- 
ships of the standard blots in the reproduced 
set. Photographic negatives of the standard 
Rorschach blots were made on panchromatic 
half-tone film (Panatomic-X) and all back- 
ground and white-space areas on the negatives 
were opaqued. From these negatives contact 
prints were made on glossy bromide paper 
and the prints were dried semi-matte. This 
procedure was not sufficient, however, to cor- 
rect minor distortions in the brightness re- 
Therefore, each negative was 
printed until the lightest area of the print 
matched that of the original card. Before 
the prints were dried, a potassium ferricya- 
nide bleach was applied to those areas which 
appeared too dark. All ten cards were treated 
in the same manner, although the bleaching 
technique was not required for the standard 
noncolored cards.* A single achromatic set 


lationships. 


*The matching was done by an experienced pho- 
tographer. It was not possible to perform experiments 
to insure that an exact match between the various 
areas of the reproduced set and the standard blots 
had been obtained. However, the spontaneous com- 
ments made by subjects on Test II indicated that 
they believed they were seeing the same noncolored 
cards as they had seen on Test I. 


COLOR AND THE RORSCHACHS8-9-10 PER CENT 


was used for all subjects. 
TABLE I 


MepiAn Ace, Sex, AND Mepian EDUCATION 
OF THE SUBJECTS 


Group Group Group Group 


I Il Ill IV 
Median age 18+ 20+ 29+ 25+ 
Median education Col.-1 Col.-8 Col.-Grad. HS Grad 
Number male Y 7 4 17 
Number female 9 3 4 17 
Total number 18 10 s 34 


Subjects. ‘There were two classes of sub- 
jects: (1) university students who were di- 
vided into three groups (1, II, and II1) and 
(2) 
were retained as a single group (IV). The 
student Groups I, I1, and III consisted of 18, 


patients of a psychiatric hospital who 


10, and 8 students respectively and were more 
or less randomly selected from a larger group 
of volunteers. Group 1V consisted of 34 pa- 
tients of whom 5 were outpatients, 24 were 
first and 5 


Cooperation was the main criterion for se- 


admissions, were readmissions. 
lection, although an effort was made to select 
only younger individuals believed to be of at 
least normal intelligence. The estimate of in- 
telligence was based on both the number of 
school terms completed and the clinical im- 
pression of the attending psychiatrist. In most 
cases the diagnoses by the psychiatric staff 
were not available to the examiner at the time 
of testing and were obtained later. Rorschach 
results were not utilized in arriving at a diag- 
nosis. Patients were distributed in the general 
diagnostic categories as follows: Psychoneu- 
rotic, 10; psychopath, 5; schizophrenic, 16; 
manic-depressive, manic, 1; epileptic, 1; and 
alcoholic, 1. 


Data for the four groups with regard to 
Sex, age, 


Table 1. 


and education are summarized in 


Experimental Design. The basic method in 
the experimental design was the test-interval- 
retest method. The subject groups were ex- 
amined as shown in Table 2. 


Groups I and IV were the key experimen- 
tal groups. Because the one-week interval 
might not have been optimum, Group II was 
included to determine if the selected interval 


123 
TABLE 2 
OUTLINE OF EXPERIMENTAL DESIGN 
"Number 
of Interval 
Group Subjects Test I between Tests Test I 
I 18 Achromatic Set 1 week Standard Ror- 
schach 
Il 10 Achromatic Set 6-8 weeks Standard Ror- 
schach 
lil 8 Standard Rorsch- 1 week Achromatic 
ach Set 
IV 34 Achromatic Set 1 week Standard Ror- 


schach 


was prejudicing the results. To Group III, a 
control group, the standard Rorschach blots 


were administered before the achromatic set. 


Test Procedure. All tests were individually 
administered. The subjects in Groups I-III 
were tested in the same room under similar 
conditions Of seating and illumination. It was 
not possible in the case of all subjects in 
Group IV to maintain the same consistency 
in physical environment for both test sessions, 
but whenever a change of room was required 
for the second test, the seating and lighting 


were arranged to duplicate, as closely as pos- 


sible, those conditions which had prevailed 
during the first session. 
The procedure for test administration, 


covering the performance proper and the in 
quiry, was adopted from Klopfer [1]. It was 
Test | 
sisting of the achromatic set for all groups ex- 
cepting Group III) that all of the students 
had taken the Ror- 


ascertained at the completion of (con 


believed they standard 
schach test. 
The 


manner almost identical with the first. Each 


second session was conducted in a 
subject was informed that he would proceed 
very much as he had at the first session. The 
necessary part of the instructions was repeated 
and the first card was introduced with the 
statement: “Don’t take anything for granted; 
that is, tell me what you see now or what it 
looks like now, regardless of whether it is the 
same or something different from what you 
saw before. It might help if you just pretend 
that I am a new examiner and that I don’t 
know abcut last week.” 

Upon completion of the second test all sub- 
jects were presented with the paired ach- 








124 


romatic and standard version of Card 2 and 
were asked to select the preferred card and to 
give a reason for the preference. The same 
procedure was followed with Cards 3, 8, 9, 
and 10. This was done to determine whether 
or not the subjects were able to perceive color. 

Scoring and Treatment of Data. The re- 
cords were scored in order to gain a more 
accurate count of the number Of separate re- 
sponses. The scoring system of Klopfer was 
the point of reference. Each of the 140 pro- 
tocols was scored twice, and whenever a lack 
of agreement in the two scorings of a single 
protocol appeared the discrepancy was re- 
solved by reference to Klopfer. 


RESULTS AND DISCUSSION 


On Test I there were no significant differ- 
ences between the four groups in the means 
or in the variability of the 8-9-10 per cent. An 
examination of the distributions of the 8-9-10 
per cents in the various groups and the vari- 
ability measures (Table 3) suggests the un- 


TABLE 3 


DISTRIBUTION OF THE 8-9-10 Per CENT For 
Test I ANp Test II 

















Test I Test II 

- : ms >. one =. > 
§ a. — . /’ aes 
5 Bo §2 By 2S Es Eu BE by 
oo Of OCS VKH OK On OH OK OD 
61l-over 1 1 

56-60 1 1 1 
51-55 0 1 2 1 2 2 
46-50 2 1 1 8 1 2 1 5 
41-45 2 3 0 2 2 3 2 5 
36-40 6 3 1 5 5 3 3 8 
31-35 4 1 2 4 4 1 0 6 
26-30 2 1 5 2 1 1 4 
21-25 1 1 3 1 1 2 
16-20 2 1 0 
11-15 2 1 
Mean 37.7 44.0 38.6 37.3 36.6 40.0 37.4 38.0 
Standard 
Deviation 8.1 88 14.7 18.6 9.1 6.0 17.5 9.6 





tenability of the hypothesis that in the absence 
of color there would be little variability in a 
group, or that no cases would fall below 30 
per cent, a few cases between 40 and 50 per 
cent, and none above 50 per cent. It appears 
that the achromatic series has the same poten- 
tiality for eliciting the range of deviations in 


JANET A. 


PERLMAN 


the 8-9-10 per cent as is expected when color 
is present. 

In Test II also, no significant differences 
between any of the groups were found in the 
means or in the variability. 

Because statistical analyses indicated that 
the groups which received the achromatic 
series first could have been assembled from 
the same population on a chance basis, the re- 
sults were pooled. For the total of 62 subjects 
a mean of 38.1 and a standard deviation of 
11.3 resulted. Similarly, the data for all in- 
dividuals receiving the standard set second 
were pooled, and the resulting distribution 
had a mean of 37.9 and a standard deviation 
of 9.0. The means for the two distributions 
are essentially the same; there is a slight de- 
crease in variability. Therefore, the hypothesis 
that color exerts an effect with regard to vari- 
ability in productivity over and above the 
potentialities of the achromatic blots is not 
substantiated. 

Sappenfield and Buker using group Ror- 
schach technique also found essentially the 
same mean 8-9-10 per cent for the standard 
and achromatic slides. A comparison of the 
mean 8-9-10 per cent that these investigators 
found in using the achromatic slides with the 
mean 8-9-10 per cent in the present study for 
all subjects given the achromatic cards on 
Test I reveals a significant difference (P > 
.O1, ¢ = 4.5). Assuming that individual re- 
sponses were counted similarly in the two 
studies, this significant difference indicates 
that the expected mean per cent (31.4 + 
oy — 0.45) with group Rorschach is signifi- 
cantly lower than the expected mean (38.1 + 
oy — 1.4) when individual administration is 
employed. Furthermore, this difference in the 
mean 8-9-10 per cent suggests that the same 
cut-off points for over- or underproductivity 
probably cannot be used for both group Ror- 
schach and for individually administered Ror- 
schach. 


Sappenfield and Buker found essentially 
the same variability in 8-9-10 per cent for the 
achromatic and standard slides. In their study 
one group was given the achromatic slides 
first and the standard slides second, while an- 
other group was given the standard slides first 








COLOR AND THE RORSCHACH 8-9-10 PER CENT 


and the achromatic slides second. The results 
from the achromatic slides for both of the 
groups were pooled, as were the results from 
the standard slides for both of the groups. 
This form of design is used specifically as a con- 
trol for retest effects, and the way in which 
the data have been grouped does not allow 
isolation of these effects. 


In the present study, in contrast to that of 
Sappenfield and Buker, a decrease in variabili- 
ty which was not significant was found on 
Test II for all groups. This reduction in vari- 
ability might be a result of a tendency for 
subjects at the extremes of the distribution to 
produce a per cent closer to the mean on the 
second encounter with the test despite the 
color variable. Using arbitrary cut-off points 
of below 31 per cent and above 45 per cent, it 
was found (Table 4) that of the 32 individ- 


TABLE 4 


NUMBER OF SUBJECTS AT THE EXTREMES OF THE 
DISTRIBUTION ON Test I WuHo Propucep A PER 
Cent CLOSER TO THE MEAN ON Test II 








Test I Test II Test I Test II 
a eS 
Mg u Po eo 3 Po h g 
25) o s Re 
6s & £ 8652 8 & 26 
Group I 
and II 
Number 6 5 0 1 3 2 1 0 
Mean Change 6.8 46 56 
Group III 
Number 3 3 0 0 2 2 0 0 
Mean Change 6.6 il 
Group IV 
Number ll 7 4 0 12 yg 2 1 
Mean Change 11.0 3.2 8.8 3.0 





uals who obtained a per cent above or below 
the cut-off points on the achromatic Test I, 
23 (72 per cent) produced a per cent closer 
to the mean on Test II, 7 produced a per cent 
further from the mean, and 2 showed no 
change. Of the five individuals who produced 
a per cent above or below the cut-off points 
on the standard Test I (Group III), all five 
produced a per cent closer closer to the mean 
on Test II. This particular finding has per- 
tinence for clinical application of the Ror- 
schach. Ordinarily a change in the 8-9-10 per 
cent toward an assumed mean per cent is con- 
sidered indicative of a change, if not an im- 


125 


provement, in the condition of the subject. 
However, these results indicate that regard- 
less of the color variable a tendency for a re- 
gression toward the norm in the measure ex- 
ists on a retest with the Rorschach. Since 
sheer familiarity with the situation may be 
the source of this tendency, retest changes in 
the 8-9-10 per cent must be cautiously con- 
sidered.* 


The possibility still remains that there is 
some validity in the 8-9-10 per cent as a per- 
sonality variable. ‘The very fact that there are 
such dramatic individual differences in the 
magnitude of the per cent cannot be ignored. 
‘The experimental evidence provided by Sap- 
penfield and Buker for group Rorschach, 
and confirmed in the findings presented here 
for individually administered Rorschach, in- 
dicates that not an 
in the 8-9-10 per cent. 


active factor 
This would, there- 
fore, preclude any interpretation of the per 
cent based on the present assumptions re- 
lating 


color is 


productivity to color responsiveness. 
However, it is still possible that the inter- 
pretation of the 8-9-10 per 
even though a rationale for the 
pretation based 
about color cannot be used. Productivity on 
the totally colored cards, measured by the 8- 
9-10 per cent, is usually taken as indicative of 
a subject’s responsiveness to external stimuli. 
A rationale for this interpretation can be 
developed without recourse to assumptions 
about color. First, cursory examination of the 
blots reveals that the last three cards may be 
divided more easily into subwholes or parts 
than may any possible combination of three 
of the other seven cards. For example, in 
Klopfer’s presentation of detail re- 
sponses, which represents for the most part a 
listing of the larger perceptual units in the 
blots, he lists 35 large detail areas for the 
first seven cards and 33 such areas for the last 
three. Next, one need only consider briefly 
the basic rationale for the Rorschach test as 
a projective technique: that the individual 
will approach and react to this relatively 
ambiguous material as he approaches and re- 


valid 
inter- 


cent is 


that is upon assumptions 


usual 


*A definite conclusion concerning the retest reli- 
ability of the 8-9-10 per cent cannot be arrived at 
from the data because of the color variable. 








126 


acts to other life situations. It follows, then, 
that the individual who exploits more aspects, 
or who examines more areas, when a situation 
offers more, is more responsive to the varied 
details of his everyday environment than the 
individual who does not. This latter individ- 
ual, who makes fewer contacts where more 
possibilities are offered, is the individual who 
fails to respond to the many varied details in 
his environment. 

The possible validity of this suggested 
rationale, upon which an interpretation of the 
8-9-10 per cent might be based, can be tested. 
The required procedure would be to correlate 
the magnitude of the 8-9-10 per cent with 
some other measure of responsiveness to en- 
vironmental stimuli. Within the scope of the 
present study no direct measure of respon- 
siveness is available. However, since a diag- 
nosis implies something about the responsive- 
ness of a subject, an estimate might be based 
on diagnosis. If responsiveness is equated to 
productivity, psychopaths should be underpro- 
ductive, schizophrenics might be either over- 
productive or underproductive depending upon 
whether they are overideated or apathetic, and 
neurotics should be overproductive rather 
than underproductive. 


The hypothesis was screened by dividing 
the hospital group (Group IV) into three 
groups: (1) psychopath, (2) schizophrenic, 
and (3) psychoneurotic, a division based on 
the assumptions: first, that these clinical 
classifications are valid, and second, that the 
diagnoses of the subjects are accurate within 
the limits of the three broad classes. Results 
of Test I were used and the cut-off points of 
over 45 per cent, and under 31 per cent were 
employed. Four of the five psychopaths were 
underproductive. The schizophrenic group 
(16 subjects), of whom six produced more 
than 45 per cent and six produced less than 
31 per cent, is composed of both apathetic and 
overideated patients.’ Of the ten psychoneu- 


JANET A. PERLMAN 


rotics five produced over 45 per cent and two 
produced under 31 per cent. Although the re- 
sults are not conclusive they are definitely not 
contrary to the hypothesis. 


SUMMARY 

1. It appears that a set of achromatic Ror- 
schach cards has the potentiality for eliciting 
the same range of deviations in the 8-9-10 
per cent as is expected when color is present. 
Color does not bring about variability in pro- 
ductivity (8-9-10 per cent) over and above 
the potentialities of the achromatic blots. 
Therefore, individual differences in the 8-9- 
10 per cent cannot be attributed to color. 

2. There is a tendency for a regression to- 
ward the norm on the retest in the case of in- 
dividuals producing at the extremes of the 
distribution despite the color variable. This 
does not necessarily mean that the 8-9-10 per 
cent not be a reliable measure when 
based on the first Rorschach test taken by a 
subject. 


could 


3. An alternative basis for the current in- 
terpretation of the 8-9-10 per cent is present- 
ed. Data are presented which suggest that it 
is advisable to investigate further the possible 
validity of this rationale for interpretation be- 
fore discarding the 8-9-10 per cent as having 
no validity for personality study. 

Received May 25, 1950. 
REFERENCES 
1. Ktoprer, B., anp Ketitey, D. M. The Rorschach 

technique. Yonkers, N. Y.: World Book Co., 1942. 
2. SAPPENFIELD, B. R., AND Buxer, S. L. Validity 

of the Rorschach 8-9-10 per cent as an indicator 


of responsiveness to color. J. consult. Psychol., 
1949, 13, 268-271. 





5At the time at which the hypothesis being ex- 
amined here arose it was impossible to secure a cri- 
terion outside of the Rorschach upon which to base 
a further breakdown of the total schizophrenic group. 
A study designed to test this hypothesis would pro- 
vide for a division of the schizophrenic group into 
apathetic and overideated types. 


NORMS FOR “SHOCK” IN THE RORSCHACH 


HERBERT SANDERSON 


JEWISH COMMUNITY SERVICE SOCIETY, BUFFALO, N. Y. 


HOSE familiar with the Rorschach 
technique are aware of the fact that 
different individuals require different 
amounts of time to respond to the ten cards 
(reaction time). Furthermore, it is recog- 
nized that the same individual will show a 
different reaction time to each card as it is 
presented to him. The clinical significance of 
reaction time lies in the assumption that af- 
fectively meaningful cards require a longer 
time to elicit a scorable response. It is held, 
for example, that the chromatic cards usually 
produce a delayed response because of the 
psychological significance of color. Rorschach 
himself thought that a particularly prolonged 
reaction time was characteristic of neurotics 
and called it a “neurotic shock.” Today, the 
term has been largely replaced by “color 
shock,” for it was found that neurotics are 
not the only ones who are affected by the 
colored cards. 
The term 
Rorschach 


“color shock” as used in the 
literature designates a complex 
emotional reaction which is aroused when a 
card containing color is presented to the sub- 
ject. ‘The presence of color shock can be re- 
cognized by several characteristic “signs,”’ one 
of which is a delayed reaction to the card. Al- 
though it is generally held that the severity of 
the shock is directly proportional to the time 
required to produce the first scorable re- 
sponse, there seems to be little agreement on 
the minimum amount of time that must elapse 
in order to be regarded as clinically signifi- 
cant. Klopfer and Kelly [3, p. 214] suggest 
that the difference between the average for 
the chromatic and the achromatic cards must 
be at least 10 seconds before any interpre- 
tation can be made. Such a method does not 
appear entirely satisfactory, however. The use 
of average reaction time rather than actual 


reaction time may obliterate the true picture 
if any one of the cards evokes an exceptionally 
delayed or a particularly fast response. In the 
second place, a direct comparison between 
and the loses 
sight of the fact that the latter cards may 


chromatic achromatic cards 
because of 


other 


evoke a delayed reaction time 


“shading shock” or “sex shock.” In 
words, although a subject may show true 
color shock, its presence (as revealed by aver- 
age reaction time) will not be apparent if he 
should also block on Cards 1V and VI. Beck 
set up an empirically derived critical score 
of 1.5-2.0, using as reference base “the aver- 
age time for the first response (T/first R) 
for all of the ten figures” [1, p. 37]. 

The present study attempted to set up 
statistically significant criteria for “shock’”’ as 
measured by the reaction time to each card. 
Since different cards elicit a different reaction 
time, the examiner cannot be certain if the de- 
lay to any one particular card is due to the 
person’s inability to repond quickly because 
of the subjective implications conveyed by the 
card, or because the card by virtue of its ap- 
pearance does not lend itself to immediate in- 
terpretation. If one should establish, for ex- 
ample, that most people (in our culture) re- 
quire twice as much time to respond to Card 
VI as they do to Card V, an individual thus 
responding will still be operating within the 
“norms.” It is only when the reaction time 
for any one card is exceeded considerably be- 
yond the time limits set by the group, that 
one may infer that the particular card has 
some special meaning to the subject. The 
question then arises, what is the normal re- 
action time to the ten cards? 


To answer this, the writer tabulated the 


reaction time to each card for 50 Rorschach 
protocols of adult clients who came to the 


127 








128 HERBERT SANDERSON 


agency’ for vocational guidance. The mean 
reaction time and the standard error of the 
mean were computed for each card as present- 
ed in Table 1. It can be seen at a glance that 
the normal reaction time varies profoundly 
from card to card. Since the present investi- 
gation does not concern itself with causative 
factors, it will suffice to note that, for what- 
ever reason, every card has its own specific 
reaction time. 


TABLE 1 


MEAN REACTION TIME AND RELATIVE REACTION 
Time TO THE RorsCHACH CARDS 














Mean Relative 
Card Reaction Reaction 
Time On Time 

I 7.76 1.98 1.0. 
II 11.66 3.33 1.50 
Ill 9.22 2.37 1.19 
IV 13.02 2.66 1.68 
Vv 19.66 2.67 1.24 
VI 14.70 5.66 1.89 
Vil 14.26 3.70 1.84 
Vill 11.46 3.71 1.48 
IX 15.56 5.53 2.01 
xX 13.54 4.37 1.74 





Using Card I as a base, the relative re- 
action time was calculated as indicated in the 
third column of Table 1. It is worth noting 
that Cards IX and VI, which are among 
those most frequently rejected in the order 
named [2, p. 72], are the very ones which 
evoked the longest reaction time in the pre- 
sent study. Card V, similarly, is rejected 
least of the ten, and accordingly has a relative 
reaction time value of only 1.24. At this point 
one is tempted to speculate whether delayed 
reaction time and rejection are not etiologi- 
cally related. 


Since the fluctuation in the reaction time 
is so great, it may be advisable to set up in- 
dividual reaction time limits for each card. 
Such a procedure would enable the examiner 
to tell at a glance whether the subject re- 
sponds unusually quickly or unusually slowly 
to the different cards as compared to the ten- 


1The Rorschach was adminstered as a part of the 
guidance program to adult clients who came volun- 
tarily to be assisted with their educational or vo- 
cational plans. The guidance service is community 
supported and is sponsored by the Jewish Commun- 
ity Service Society of Buffalo. 


tative norms. In other words, a comparison 
with the average reaction time to each card 
will indicate whether the subject deviates 
“significantly” from the general pattern, and 
the extent of such deviation. A particularly 
marked deviation in the positive direction 
(i.e., long interval) may be considered as one 
sign of shock. The clinical meaning of very 
quick responses is not entirely clear, although 
they too are probably significant. 

A deviation of 2 sigmas was selected as a 
limit. Scores falling above this value may be 
viewed as statistically significant. The 2- 
sigma limit corresponds very closely to the 5 
per cent level of confidence and may be con- 
sidered entirely adequate for the present 
problem [4, p. 175]. The reaction time range 
for each card is presented in Table 2. 


TABLE 2 


Tue RANGE OF NoRMAL REACTION TIME FOR THE 
RorRSCHACH CARDS 





Seconds Expressed in 





Card Seconds Whole Numbers 
I 4.80 — 10.72 5-11 
II 5.00 — 18.32 5-18 
Ill 4.46 — 13.98 5-14 
IV 7.68 — 18.36 8-18 
V 4.32 — 15.00 4-15 
VI 4.38 — 25.02 4-25 
VII 6.86 — 21.66 7-22 
VIII 4.04 — 18.88 4-19 
Ix 4.50 — 26.62 5-27 
x 4.80 — 22.28 5-22 





To recapitulate, three criteria for evaluat- 
ing the reaction time to the Rorschach cards 
have been set up; the specific average reaction 
time of each card, the relative reaction time 
of each card (with Card I used as a base) 
and finally, the reaction time range as ex- 
pressed in terms of probability. The writer 
holds that the specific reaction time and the 
range concepts are particularly useful with 
individual protocols that essentially approxi- 
mate the tentative norms given in Table 1. 
The relative reaction time concept is more ap- 
plicable to records that show wide inter-card 
fluctuation in their reaction time. An illustra- 
tion covering the two situations may be in 
order. Individual A’s reaction time (in sec- 
onds) to the ten cards was as follows: I-12, 














NORMS FOR “SHOCK” IN THE RORSCHACH 


11-9, IlI-11, IV-15, V-6, VI-20, VII-14, 
VIII-10, IX-21, X-18. A superficial exami- 
nation of these values may suggest that delayed 
reaction time was evinced on Cards VI 
and IX. Consulting tables containing the 
specific and the range values, however, will 
show that the obtained reaction time does not 
deviate significantly from the averages of the 
group and does not exceed the 2-sigma limits. 
Subject B, on the other hand, attained the fol- 
lowing reaction time values: I-7, II-8, III-7, 
IV-9, V-6, VI-8, VII-18, VIII-10, [X-11, 
X-9. Although a reaction time of 18 seconds 
on Card VII does not exceed the level of 
significance as compared to the group as a 
whole (the range for Card VII being 7-22 
in round numbers) the deviation is signifi- 
cant in terms of B’s individual reaction time 
to the rest of the cards. Using B’s Card I as 
a base, we get 18/7 or 2.57. The relative re- 
action time for Card VII is only 1.84. It is, 
therefore, quite apparent that B “blocked” on 
that particular card. 


SUMMARY 


The present investigation concerns itself 
with the question of what constitutes a delayed 


129 
reaction time to the Rorschach ink blots. 
Three statistical criteria were devised, based 
on the examination of 50 Rorschach protocols. 
The criteria are: The specific average re- 
action time of each card, the relative reaction 
time of each card (RT for any one card/RT 
Card I), 


action 


and the range outside of which re 


time must must be considered as 
statistically significant. It is held the appli- 
cation of the tentative norms to the individu- 
al Rorschach records will further objectify 
the meaning of time, 


especially as it applies to the concept of “shock.” 


increase in reaction 


Received July 26, 1950. 


REFERENCES 


1. Beck, S. J. Rorschach’s test. New York: Grune 
& Stratton, 1949. Vol. II. 

2. Bocuner, RuTH, AND HALPERN, FLorence. The 
clinical application of the Rorschach test. New 
York: Grune & Stratton, 1948. 

3. Kvoprer, B.. Anp Kettey, D. McG. The Ror- 
schach technique. Yonkers, N. Y.: World Book 
Co. 1946. 


4. Perers, C. C., AND VAN Voornis, W. R. Statisti- 
cal procedures and their mathematical bases. New 
York: McGraw-Hill, 1940. 











THE CHANCE DISTRIBUTION OF 
SZONDI VALENCES'’ 


JACOB COHEN 


THE COLLEGE OF THE CITY OF NEW YORK AND BRONX VETERANS HOSPITAL 


N connection with consultation on a doc- 
toral dissertation [1], the author was 
posed a problem in the statistical analysis 

of Szondi data, the solution of which may be 
of general interest. 


The problem was this: To what extent is 
the distribution of (plus, 
open, and ambivalent) of any Szondi factor for 
any group of individuals a function of the 
stimulus values of the six photographs repre- 
senting that factor rather than a function of 
chance? For example, let us assume that 100 
clinically normal adult males are administered 
the Szondi and that the valence distribution 
with regard to the A’ factor is as follows: 53 
of the 100 subjects give plus h, 25 give minus, 
10 open, and 12 ambivalent. In order to evalu- 
ate this or any other obtained distribution of 
valences for any given factor, it is neces- 
sary to develop the theoretical distribution 
which would obtain on the assumption of ran- 
dom likes, dislikes, and no-choices of the pho- 
tographs representing the factor. In effect, this 
is the theoretically expected distribution were 
the test to be given to a large number of blind- 
folded subjects. 


valences minus, 


The solution proceeds by evaluating the 
probability of the occurrence of from zero 
through six choices of the factor in question, 
where choice is defined as the selection of the 
photograph representing the factor as either 
liked or disliked. The structure of the Szondi 
test is such that each of the eight factors is rep- 
resented once in each of six sets of photographs. 


Reviewed in the Veterans Administration and 
published with the approval of the Chief Medical 
Director. The statements and conclusions published 
by the author are the result of his own study and 
do not necessarily reflect the opinion or policy of the 
Veterans Administration. 


The subject makes four choices in each set (two 
likes and two dislikes) ; therefore, the prob- 
ability (~) of choosing the factor in question 
in each of the sets is 4/8 or1/2. 

It then becomes possible, by expanding the 
binomial (p + q)* to compute the theoretical 
expectation of differing numbers of choices 
from zero through six: 

(p + q)° = p® + 6p'g + 15 pg? + 
20 p*q® + 15 p*q* + 6 pq’ +¢° 
where # is the probability of a choice and q 
of a non-choice. Since p = g =1/2, the p*"gq’ 
parts of each term equal 1/2*, which equals 
1/64. ‘The terms of the expansion then lead 
to the probabilities given in Table 1. 


TABLE 1 
THe THEORETICAL PROBABILITIES OF DIFFERING 
NUMBERS OF CHOICES OF A GIVEN FACTOR IN 
THE SZOND1 SITUATION 


Number of Relative 


Choices Frequency Probability 
6 1 0156 
5 6 0938 
4 15 2344 
3 20 3125 
2 15 2344 
| 6 .0938 
0 1 -0156 
Total 64 1.0001 


The analysis thus far has not distinguished 
between likes and dislikes. It now becomes 
necessary to make this distinction for each of 
the seven possible numbers of choices (zero 
through six). Each will now be evaluated in 
order, starting with the situation where all 
six of the photographs representing the factor 
have been chosen. 

Of the cases where six choices in the factor 
have been made, since like and dislike are 


130 


THE CHANCE DISTRIBUTION OF SZONDI VALENCES 131 


equiprobable, the distribution of possible com- 
binations of number liked and number dis- 
liked is obtainable by means of the same bino- 
mial expansion as above. In this instance, 
however, p is the probability of choosing a 
picture as liked and q the probability of 
choosing it as disliked. 


(p + q)* = p* + 6 p’g + 15 ptg? 4 

20 p*q® + 15 p*g* + 6 pq? + ¢° 
This leads to Table 2, which is mathe- 
matically identical with Table 1 but which 
has different meaning for the present problem. 


TABLE 2 


Tue PRopABILities OF THE VARIOUS COMBINATIONS 
or Likes AND Distikes Wuere Six Cuoices Are 
Mabe 1n A FActor, AND THEIR SUMMATION 
As SZONDI VALENCES 


Proba- 
bility 
Relative of 
Likes Dislikes Frequency Proportion Valence Valence 


6 0 1 0156 + 

5 1 6 .0938 + 3438 
4 2 15 .2344 + 

3 3 20 3125 A .3125 
2 4 15 2344 — | 

1 5 6 .0938 — 3438 
0 6 1 .0156 — 

Total 64 1.0001 1.0001 


Table 2 is interpreted as follows: Of those 
cases where six choices in a factor are made, 
3438 are plus (+), .3438 are minus (—), 
and .3125 are ambivalent (4). 


TABLE 3 


THE PROBABILITIES OF THE VARIOUS COMBINATIONS 
or Likes AND Distikes Wuere Five Cuorces Are 
MApDeE IN A FAcTor, AND THEIR SUMMATION 
As SzZonpDI VALENCES 


bility 
Relative of 
Likes Dislikes Frequency Proportion Valence Valence 





5 0 1 0313 + 
4 1 5 1563 + 1876 
3 2 10 3125 A\ 
6250 
2 3 10 3125 A|\ 
1 4 5 1563 — 
0 5 1 0313 ad Ars 
Total 32 1.0002 1.0002 








A similar procedure is followed for those 
cases where five choices are made for a specific 
factor. The appropriate binomial is expanded: 


(p + g)°= p* + Sptg + 10 p*q? 4 
10 p*qg® + Spq* + ¢° 


The resulting probabilities are given in 


Table 3. 


From Table 3 it is seen that of those cases 
where five choices in a factor are made, .1876 
are plus, .1876 are minus, and .6250 are 
ambivalent. 


Following a similar procedure, for the cases 
where four choices are made of the photo- 
graphs representing a given factor, the ap- 
propriate binomial is expanded: 

(p + q)* = p* + 4p°q + 6p7q? 4 
4 pq? + qf, 


and Table 4 results. 


TABLE 4 


THe PRopasilities OF THE VARIOUS COMBINATIONS 
oF Likes AND Distikes WuHere Four Cuoices Are 
MADe IN A FAcTor, AND THEIR SUMMATION 
As SZONDI VALENCES 


Proba- 
bility 
Relative of 
Likes Dislikes Frequency Proportion Valence Valence 


4 0 1 0625 an 
3 1 4 25 7 = 
2 2 6 375 A375 
1 3 ; 25 ‘pee 
0 4 i 0625 j <stes 
Total 16 1.0000 1.0000 


Table 4 yields the information that of those 
cases where four choices are made for the 
factor, .3125 are plus, .3125 are minus, and 
.375 are ambivalent. 

For the cases where three choices are made 
in the factor, the binomial expansion is: 


(p + g)*= p* + 3 p?g + 3 pq? + @?, 
and Table 5 results. 


From Table 5 it is seen that of those cases 
where three choices are made in the factor. 








132 


TABLE 5 


THe PrRosasitiries oF THE VARIOUS COMBINATIONS 
or Likes AND Distikes Wuere Turee Cuoices Are 
MApe IN A Factor, AND THEIR SUMMATION 
AS SZONDI VALENCES 


Proba- 

bility 
Relative of 

Likes Dislikes Frequency Proportion Valence Valence 
3 0 1 125 + 5 
2 1 3 375 a : 
1 2 3 375 _ 5 
0 3 1 125 _ - 
Total 8 1.000 1.0 


half are plus and half are minus. 


The binomial expansion for the cases where 
two choices have been made in the factor is 


(@t+tq?=—=p +2pq¢+ 7 
and Table 6 results. 


TABLE 6 


THE PROBABILITIES OF THE VARIOUS COMBINATIONS 
or Likes AND Distikes WHERE Two Cuoices Are 
MApDeE IN A FAcToR, AND THEIR SUMMATION 
AS SZONDI VALENCES 


Proba- 
bility 
Relative of 
Likes Dislikes Frequency Proportion Valence Valence 


2 0 1 25 - 25 

1 1 2 5 O » 

0 2 1 25 _ 25 
+ 1.00 1.00 


Total 


From Table 6 it is seen that where two 
choices in the factor are made, .25 of them 
are plus, .25 are minus, and .5 are open (O). 


Whenever zero or one photograph repre- 
senting a factor is chosen, the valence for that 
factor is open (O). From Table 1, the prob- 
ability of one choice is .0938 and of zero 


choices .0156. 


In order to obtain the over-all probability 
for each valence (+, —, 4, or O), the prob- 
abilities must be summed over all the possible 
numbers of choices. For example, the prob- 
ability of choosing six photographs in a factor 
was found to be .0156 (Table 1). Of these, 
.3438 were found to be of plus valence 
(Table 2). Therefore, .3438 times .0156, or 
.0054, is the probability of six photographs of 


JACOB COHEN 


a factor being chosen whose like-dislike dis- 
tribution is such as to be considered plus. The 
probabilities for other numbers of choices is 
similarly determined, and these are summed 
for the over-all probability of a plus valence: 


For six choices: (.3438) (.0156) == .0054 
For five choices: (.1876) (.0938) == .0176 
For four choices: (.3125) (.2344) == .0732 
For three choices: (.5) (.3125) == .1563 
For two choices: (.25) (.2344) == .0586 
Probability of a plus valence —= Total 3111 


Since minuses are {perfectly symmetrical 
with pluses Fae ee the over-all probabili- 
ty of a minus valence if also .3111. 

The probability of an ambivalent (4) val- 
ence on a chance basis for the various num- 


bers of choices is as follows: 


For six choices: (.3125) (.0156) = .0049 
For five choices: (.625) (.0938) = .0586 
For four choices: (.375) (.2344) =< _ .0878 
Probability of an A valence = Total 1513 


The probability of an open (O) valence on 
basis for the various numbers of 
choices is as follows: 


a chance 


For two choices: (.5) (.2344) = 1172 
For one choice: .0937 
For zero choices: 0156 
Probability of an open valence — Total .2265 


The distribution of factor valences on the 
assumption of chance choices is therefore as 
follows: 


Plus 3111 
Minus 3111 
Open 2265 
Ambivalent§ .1513 


Utilizing the above theoretical probabili- 
ties, the hypothesis of random choice for any 
obtained distributions of valences on any 
Szondi factor can be tested by means of chi- 
square. 

The hypothetical problem presented in the 
second paragraph can now be solved. Table 7 
presents the valence distribution for the h fac- 


TABLE 7 
THEORETICAL AND OBTAINED Frequency Distrisu- 
TIONS OF THE VALENCES FOR THE SZOND1I 4 FACTOR 
FoR 100 Normat ApuLT MALEgs 








-_ = .; 2 





Theoretical 31 31 23 15 
Obtained 53 25 10 12 





i 


he 


ili- 
ny 
ny 
‘hi- 


the 
e7 


fac- 


IBU- 
OR 


THE CHANCE DISTRIBUTION OF SZONDI VALENCES 133 


tor given by 100 normal male adults together 
with the theoretically expected distribution on 
the assumption of random choice. 


Since we are testing the divergence of an 
obtained from a theoretical distribution, the 
appropriate statistical model is that of chi- 
square. [The chi-square value for Table 7 is 
found to be 24.7, which, for three degrees of 
freedom, is significant beyond the .001 level. 
The hypothesis that the obtained distribution 
is explicable on the assumption of random 
choice is thus found to be untenable. 


SUMMARY 


Using probability theory, the theoretical val- 
ence distribution for Szondi factors was 
found on the assumption of random choice of 
photographs. An example of its use in re- 
search with the Szondi was provided. 

Received July 17, 1950. 


REFERENCE 


1. GotpMan, G. D. An investigation of the simi- 
larities in personality structure of groups of idio- 
pathic epileptics, hysterical convulsives, and neu- 
rological patients. Unpublished Doctor’s disser- 
tation, New York Univ., 1950. 








THE INTERNAL STRUCTURE OF THE MMPI* 


WILLIAM MARSHALL WHEELER » 


VETERANS ADMINISTRATION NEUROPSYCHIATRIC HOSPITAL, LOS ANGELES, CALIFORNIA 


KENNETH B. LITTLE anv GEORGE F. J. LEHNER 


UNIVERSITY OF CALIFORNIA, LOS ANGELES 


HE MMPI is a widely used diagnostic 

tool. Many studies have been made con- 

cerning its use in various situations and 
with various types of cases, its relationship to 
other tests, the reliability and validity of the 
various scales, and the relationships among the 
scales as reflected in “patterns” or “profiles” of 
scores. Little, however, seems to have been done 
to determine the nature and extent of the 
interrelationships among the various scales of 
the total test. Information concerning the in- 
tercorrelations of the various diagnostic scales 
is needed. The present study attempts to 
provide information about scale interrelation- 
ships by utilizing a factor analysis approach. 
An examination of the literature fails to show 
any factorial study of the scales of the MMPI, 
although in a study by Cottle [2] a factorial 
analysis was made of the MMPI in relation to 
the Strong, the Kuder, and Bell Inventories. 


Another aim of the present study is to ap- 
praise statistically the possibility of making dy- 
namic interpretations in terms of different 
scale scores. If, for example, one factor were 
found, it would imply that the test measures 
degree of disturbance, not sind of disturbance, 
indicating that the test served mainly as a 
screening device. If, on the other hand, several 
different factors should emerge, each fairly 
closely identified with a scale, the use of the 
test in its present form for differential or dy- 
namic interpretations would be supported. 


DESCRIPTION OF MMPI 


MMPI are 


*Reviewed in the Veterans Administration and 
published with the approval of the Chief Medical 
Director. The statements and conclusions published 
by the authors are the result of their own study and 
do not necessarily reflect the opinion or policy of 
the Veterans Administration. 


The scales available in the 


hypochondriasis (Hs), depression (D), hys- 
teria (F/y), psychopathic personality (Pd), 
paranoia (Pa), psychasthenia (Pt), mascu- 
interest (Mf), schiz- 
ophrenia (fc), and hypomanic trends (Ma). 
In addition to these, three other scales, the 
question score (?), the lie score (L), and a 
validity score (Ff), are used, respectively, to 
assist in assessing for a subject whether he 
understands the items, whether he tries to 
place himself in too favorable a light, or 
whether he is cooperative. Recent forms of 
the test include an additional scale (K) de- 
scribed as a “correction factor” for sharpen- 
ing the discriminatory value of several of the 
other scales. The total of 504 items utilized 
in these various scales represents 60 (arbi- 
trarily) defined areas of human activity as 
described by Hathaway and McKinley [5]. 
In interpretation, items responded to as 
either “True” or “False” are scored accord- 
ing to empirically derived keys and translated 
into a standard scale system with an average 
value of 50 and a borderline value of 70. The 
number of 


linity-femininity of 


items that a subject does not 
answer “True” or “False” determines his ? 
score. In general, the higher the score is above 
50 the more significant is the deviation. The 
meaning of scores below 50 is at present not 
clear. 

As is pointed out by Schiele, Baker, and 
Hathaway [9], the scales were compiled by 
comparing the responses of clinically diag- 
nosed patients with those of persons not 
under psychiatric care. The particular items 
characterizing a symptom complex were 
identified by the contrasting tendency for 
normal and abnormal patients to respond 
“True” or “False,” regardless of the verbal 
content of the item. 


134 


THE INTERNAL STRUCTURE OF THE MMPI 135 


Each scale was designed to provide a mea- 
sure, in terms of a score, of the strength of 
a certain trend or component in personality 
makeup. High scores on any of the scales 
purportedly indicate deviations in terms of 
which differential diagnoses may be made. 
Although the MMPI has been described as “‘the 
first inventory measuring common specific 
clinical syndromes, in contrast to the earlier 
schedules designed for either the more general 
concept of ‘neuroticism’ or the special states 
like ‘inferiority,’ ” [9, p. 293] this assertion 
has not been tested. In the light of the data to 
be reported from the present factor analysis 
study, this statement is questionable—partic- 
ularly if it implies much emphasis on “specific” 
clinical syndromes for differential diagnosis. 

As a matter of fact, work such as that of 
Meehl [8] and Gough [4], who discuss the 
clustering of scores of certain scales, and who 
neurotic scales” (Hs, D, and 
Hy), and the “‘psychotic scales” (Pa, Pt, and 
Sc), also suggests considerable overlap of the 
various “specific” scales frequently employed 
for differential diagnoses. The discovery of 
the “neurotic triad” and the “psychotic triad” 
in clinical work reinforces the need to make a 
more detailed study of individual scale load- 
ings and their contribution to total test score 
variance. Although the authors of the test 
suggest that interrelationships among scales 
indicate “the dynamic interrelationships of 
different clinical syndromes” [6, p. 3], a more 
parsimonious interpretation suggests that these 
relationships indicate the degree of psycholog- 
ical identity of the scales. 


“ 


speak of the 


PROCEDURE 


The MMPI, group form, was administered 
to 110 neuropsychiatric male patients in a 
large Veterans hospital in Los Angeles, Cali- 
fornia. These patients ranged in age from 20 
to 63, with a mean age of 33. The cases re- 
presented a random selection of neurotic and 
psychotic patients (and a few cases in which 
organic factors were also present) tested at 
the hospital during the year 1948. A detailed 
nosological breakdown of the cases was not 
attempted in view of the usual unreliability 
of such categorizing. The comparison group 
consisted of 112 male college students at 


UCLA. 

Although a short 373-item group form was 
used, all K scale items were included in the 
test, thus including all scored items. 

To measure the interscale relationships of 
the MMPI, or, more accurately, to investi- 
gate the extent to which the scales measure 
the same personality factor or factors, Thur- 
stone’s centroid method of multiple factor an- 
alysis was employed. All computations of the 
Pearsonian correlation coefficients for the two 
groups were made on the raw scores, rather 
than the 7° scores of the scales. 


Preliminary to the use of Thurstone’s tech- 
nique in the present investigation, a pilot 
study was made on two groups of 30 and 53 
cases utilizing a statistical technique developed 
by Gengerelli [3]. His method suggested 
that most of the total variance of the MMPI 
could be explained in terms of the variance of 
two or three scales. For example, these prelimi- 
nary studies showed that the correlation be- 
tween the Sc and Pt scales was above .80, sug- 
gesting that the two tests are measuring the 
same personality variable. The correlational 
matrix obtained by Cottle [2] shows an r of .84 
for these two scales, as well as other correla- 
tions reflecting the same type of relationship 
among various scales. 

Since many of the scales contain items also 
appearing in another scale or scales, the pos- 
sible effect of overlapping items on the pre- 
sent investigation was considered. Table 1 
indicates, above the principal diagonal, the r’s 
among scales based on the amount of item 
overlap. Below the diagonal are the number of 
items in common minus those items that are 
scored in the opposite fashion for the respective 
scales. The two halves are, of course, merely 
different representations of the same _rela- 
tionship. Along the principal diagonal (ital- 
icized figures) is given the number of items 
in each scale. This table is presented to 
indicate the amount of communality any 
two scales have as a result of identical items. 
At first glance a correction of the empir- 
ical correlations would seem to be indicated. 
However, the consideration of a correlation 
coefficient as an index of percentage of com- 
mon elements, of which physical identity is 
only one kind, leads to the conclusion that 





136 W. M. WHEELER, K. B. LITTLE, AND G. F. J. LEHNER 


TABLE 1 
AMOUNT OF OveLAP oF SCALES Dugz To CoMMON ITEMS 








= = See ee 





L K F Hs D Hy Pd Mf Pa Pt Se Ma 
L 15 05 03 .00 .08 .03 .00 03 —04 —04 —.03 04 
K 1 30 .02 .00 .09 24 16 .08 06 —.05 .02 08 
F 1 1 64 .02 .03 01 .09 03 18 .02 21 01 
Hs 0 0 1 33 18 46 .02 .00 .03 05 .08 .00 
D 2 4 2 8 60 .20 .14 .03 .04 .26 10 05 
Hy 1 10 1 20 12 60 iat 06 .08 13 12 .00 
Pd 0 6 5 1 7 10 50 02 18 12 16 .08 
Mf 1 3 2 0 2 + 1 60 04 .02 .06 .02 
Pa —1 2 9 1 2 4 8 2 40 09 23 05 
Pt —1 —2 1 2 13 7 6 1 4 48 25 06 
Sc —1 1 15 + 8 8 10 4 13 15 7 18 
Ma 2 3 1 0 —3 0 4 1 2 3 11 46 





ratio for this difference. All differences are 
significant beyond the 1 per cent level of con- 


fidence. 


As can be seen from these data, the NP 
group scores significantly above the college 
group on all scales expcept K and Mf, where 
they score significantly below, confirming the 
value of the test as a general screening device. 


extraction of these would be based on spurious 
reasoning. In other words, positive item overlap 
is one way in which the scales are related and 
should not be interpreted as detracting from 
empirical relationships based on scores obtained 
on the scales. 


RESULTS 


The mean values obtained for each group 
and the combined groups for each of the 
MMPI scales are given in Table 2. Cottle’s 


The intercorrelations obtained for the 12 
scales used (the question scale was omitted 
since no records with a question score greater 


means are included for comparison. Also pre- 
sented are the differences between the college 
and NP means on each scale and the critical 


than 25 are included in the present data) are 
presented in the 
Tables 3, 4, and 5. 


correlation matrices in 


TABLE 2 
MEAN VALUES FOR COLLEGE AND NEUROPSYCHIATRIC GROUPS AND THEIR DIFFERENCES FOR EACH 
SCALE OF THE MMPI (THe ComsBinep MEAN VALUES AND THE MEAN VALUES FROM 
CortrLe’s DATA ARE PRESENTED FOR COMPARISON ) 

















College Neuropsychiatric 
N= 112 N= 110 N = 222 

Mean SD Mean SD Diff SD aice CR* Combined Cottle’s 

Scale Mean Mean 
L 3.3 2.3 5.1 2.6 1.8 33 5.45 4.2 3.8 
K 16.4 4.7 14.5 5.3 1.9 .67 2.33 15.5 aa 
F 4.7 3.1 8.6 7.1 3.9 1.15 3.39 6.6 4.0 
Hs 3.9 3.2 11.3 7.0 7.4 1.18 6.27 7.6 5.5 
D 19.8 5.2 26.5 7.6 6.7 88 7.61 23.1 19.5 
Hy 20.7 4.3 25.7 7.8 5.0 85 5.88 23.2 21.7 
Pd 15.6 4.5 21.4 5.7 5.8 .63 9.21 18.4 15.3 
Mf 27.7 4.3 24.7 4.3 3.0 .62 4.84 26.2 25.3 
Pa 8.9 2.7 11.7 5.0 2.8 54 5.18 10.3 9.2 
Pt 10.0 7.4 16.9 10.5 6.9 1.18 5.85 13.4 12.4 
Sc 10.0 6.9 18.5 12.3 8.5 1.31 6.49 14.2 10.8 
Ma 16.9 44 18.8 5.6 1.9 .67 2.83 17.8 17.9 





*All differences are significant at or beyond the .01 level. 





THE INTERNAL STRUCTURE OF THE MMPI 137 





TABLE 3 
CORRELATIONS AMONG THE TWELVE SCALES OF THE MMPI For 112 MALe CoLiece STUDENTS 

L K F Hs D Hy Pd Mf Pa Pt Sc Ma 
L .366 046 —.026 .076 243 —.102 —.029 127 —299 —334 —.151 
K .366 —105 —.305 —.068 499 —.090 —.162 193 —.587 —510 —.297 
F 046 —.105 489 561 310 .554 178 301 457 .674 .194 
Hs —.026 —.305 489 .560 .372 417 191 .212 .577 554 .162 
D .076 —.068 561 .560 458 414 .260 .383 .570 484 —.272 
Hy .243 499 .310 372 458 348 .183 .399 .049 .077 —.075 
Pd —102 —.090 554 417 414 .348 153 303 420 .590 .308 
Mf —.029 —.162 .178 191 .260 .183 153 .332 417 391 124 
Pa 127 193 301 212 .383 .399 .303 .332 .234 .304 .014 
Pt —299 —,587 457 .577 .570 .049 .420 417 .234 821 181 
Sc —.334 —.510 .674 554 484 .077 590 391 .304 821 348 
Ma ~.151 —,297 .194 162 —.272 —.075 .308 .124 .014 181 .348 


TABLE 4 
CORRELATIONS AMONG THE TWELVE SCALES OF THE MMPI For 110 MALE 
NEUROPSYCHIATRIC PATIENTS 


a SS a 


L K F Hs D Hy Pd = Mf Pa Pt Se Ma 

L 302 —097 138 027 134 —.121 —.085  .028 —.285 —.194 —.162 
K 302 —404 —201 —185 138 —383 —190 —.261 —655 —550 —451 
F —.097 —.404 334 154 .021 .507 .300 .690 536 .764 411 
Hs 138 —.201 334 .680 .759 405 258 477 523 516 271 
D 027 —.185 .154 .680 .690 453 346 .346 635 469 .038 
Hy .134 .138 .021 .759 .690 .298 .257 .299 344 .222 .075 
Pd —121 —.383 .507 405 453 .298 297 615 .590 604 515 
Mf —085 —190 .300 .258  .346 .257 ~~ .297 45734432584 
Pa 028 —261 690 477 .346 .299 615 457 538 .703 436 
Pt —<285 —655 536 583 485 306 5990 904 538 857 461 
Sc —194 —550 .764 .516 469 .222 604 .325 .703 ~~ .857 539 
Ma —162 —451 411 .271 038 075 515 184 436 461 539 


TABLE 5 
CORRELATIONS AMONG THE TWELVE SCALES OF THE MMPI ror MALE CoLLece STUDENTS AND 
MALE NEUROPSYCHIATRIC PATIENTS COMBINED 


ee Sa eS 





I K F Hs D Hy Pd Mf Pa Pt Se Ma 
L 241 .070 .262 .226 .274 081 —.162 169 —130 —.066 —.078 
K 241 —.343 —.282 —.205 162 —.315 —.109 —163 —639 —.546 —.408 
F 070 —.343 A71 360 195 .580 118 657 566 .775 383 
Hs .262 —.282 A71 .728 .730 566 .002 519 607 618 .299 
D 226 —.205 .360 .728 683 565 119 451 672 565 025 
Hy .274 162 195 .730 683 435 082 408 357 .307 .099 
Pd 081 —.315 .580 566 565 435 .041 591 606 670 468 
Mf —.162 —.109 118 .002 119 082 041 .254 .218 173 088 
Pa 169 —.163 657 519 A451 408 591 254 516 .660 347 
Pt —130 —.639 566 607 672 357 606 .218 516 864 398 
Sc —.066 —.546 .775 618 565 .307 470 173 .660 864 .504 


Ma —.078 —.408 -383 299 .025 .099 468 -088 -347 398 504 











138 


A comparison of the two groups reveals 
significant differences for 17 correlations as 
shown in Table 6. Relevant correlations from 
Cottle’s matrix are also presented. 


TABLE 6 
SIGNIFICANT* CHANGES IN SCALE RELATIONSHIPS 
FROM COLLEGE TO NP SUBJECTS 


——————— es es 





r College NP Diff. Cottle’s Data 
"KF —.105 —404 —.299 —_ 
°K Hy 499 138 —.361 —_— 
"K Pd —.090 —.383 —.293 —_ 
°K Pa 193 —.261 —.454 —_ 
"FD 561 154 —.407 344 
"F Hy 310 021 —.289 159 
°F Pa 301 .690 .389 .206 
THs Hy 372 -759 387 588 
THs Pa 212 477 .265 183 
"D Hy 458 .690 232 415 
'D Ma —.272 .038 .310 —.083 
rHy Pt 049 344 ~—-.295 182 
"Pd Pa .303 615 312 .329 
"Pa Pt .234 538 .304 .363 
"Pa Sc 304 .703 .399 415 
‘Pa Ma .014 436 422 112 
"Pt Ma 181 461 .280 357 


*To compute significance of differences between cor- 
relations, they were transformed to z values. All differ- 
ences given in the table were significant at or beyond 
the .05 level. 


‘lwo general conclusions can be drawn: 
(1) Cottle’s group tends to fall between our 
college group and our NP group not only in 
mean scores but also in terms of the direction 
of change of relationships among scales; (2) 
the trend of change of relationships between 
scales is a consistent one and deserves further 
investigation to determine its implications. For 
example, one speculation can be made about the 
changes in relationships of the Pa scale. Pa 
has its maximal relationship with Hy in the 
normal group and with Sc in the NP group. 
With Cottle’s included indi- 


group, which 


vidua!s seeking psychological aid, the relation- 
ship of Pa to Sc is between that obtained for 


the two groups in the present study. These 


ee. ae 
relationships would seem to substantiate the 


+ “+ + . lL . “ - > 
interpretation that paranoid reactions as mea- 


sured by the items in this scale can be part of 
two dillerent syndromes, perhaps a neurotic 


and a psychotic. 
A comparison of the correlation coefficients 


obtained here with the reliability coefficients 


W. M. WHEELER, K. B. LITTLE, AND G. F. J. LEHNER 


published for the scales [7] reveals that vari- 
ous pairs of scales show correlations that ap- 
proximate the reliability coefficients obtained 
for either of the members of the respective 
pair. This would seem to indicate that the 
particular scales in question do not measure 
different things in spite of the different labels 
applied to them and in spite of the differenti- 
ating functions: imputed to them. For ex- 
ample, the correlation between Sc and Pt of 
.86 from the table of data on the combined 
if corrected for attenuation on the 
basis of Holzberg’s reliabilities, which are the 
highest published to date, rises to the improb- 
able figure of 1.08, while the correlation be- 
tween Hs and D of .73 rises to the value of 
99 


groups, 


In Tables 7, 8, and 9 are presented the 
factor loadings after rotation for the college 
group, the NP group, and for the two groups 


combined. 





TABLE 7 
Factor Loapincs AFTER KOTATION, 
CoLLece Group (N — 112) 

I II Ill IV h? 
ZL a 351 — = .242 
K —.630 .578 —_ = -740 
F 590 450 — ~~ 592 
Hs .627 345 _ _— 538 
D .503 530 -— 595 .894 
Hy a .780 — -= .615 
Pd 556 425 ao — 581 
Mf 358 — 538 — 441 
Pa -= .510 .339 — 413 
Pt .908 — —— = .926 
S< 943 — —- ——— .914 
Ma — — — —.595 452 

TABLE 8 
Factor LoApDINGs AFTER ROTATION, 

NP Group (N = 110) 

I II Ill IV h2 
_ — — .234 
K —.702 -— ~- .643 
F .668 — 458 .700 
Hs 529 .740 — 873 
D 481 .648 — 741 
Hy — .845 — 771 
Pd .700 a —_ 551 
Mi 381 — -—— 338 
Pa 632 — 591 .780 
Pt .936 — — 944 
S< 925 — — 878 
Ma .620 — _ 494 





THE INTERNAL STRUCTURE OF THE MMPI 


TABLE 9 


Factor Loapincs AFTER ROTATION, 
CoMBINED COLLEGE AND NP Groups (N — 222) 


—————=— 








I Il Il IV h? 
L -— 469 — = 200 
—— am _ - £0 
F 688 — ~~ ~~. 20 
Hs 600 ~—-.640 _ — we 
D 554 680 _ —_- 
Hy on 789 - - CF 
Pd 682 _ — 46 22 
Mf _ —_ a - £0 
Pa 602 -—- —- ae 
Pt 936 ome sm =— oe 
Se 931 = _ — “oe 
Ma _.550 ~ one — £0 


Normal Group. An examination of the 
data for the normal group indicates that there 
are two major group factors clearly defined. 
The first factor has its maximal loadings on 
the Sc and Pr scales, showing, respectively, 
values of .943 and .908. Other positive load- 
ings on this factor are found in Hs, F, Pd, D, 
and to a slight extent, Mf. In contrast with 
these positive loadings, we find that the K scale 
shows a high negative loading of —.630. 


An interpretation of this factor in the light 
of these loadings would suggest that it indi- 
cates primarily concern with one’s self. The 
scales which are most heavily loaded, Sc and 
Pt, are two of the three commonly referred to 
as the psychotic triad, and seem to reflect the 
encapsulating withdrawal of a schizoid type 
including excessive concern with compulsive 
needs. ‘The extremely high negative loading 
on the K scale would seem to indicate that 
when this factor is present to a marked degree 
in an individual, the usual cgo-defensive 
mechanisms are held in abeyance and the per- 
son now tends to show himself in the worst 
possible light. 

The second major factor has its maximal 
loadings on Hy (with a value of .780) and, 
interestingly enough, on K (with a value of 
578). Other significant loadings on this fac- 
tor occur on scales D, Pa, F, Pd, L, and Hs. 
This factor seems to reflect, on the basis of 
these loadings, the neurotic picture of adjust- 
ment. The high positive loading of K for this 
factor, as compared with the high negative 
loading of K for Factor I, implies that the 


139 


ego-defenses are intact. Perhaps one of these 
ego-defensive mechanisms is indicated by the 
positive loading of .510 on the Pa scale. ‘This 
suggests that the paranoid projections serve 
defense in the 
group than as a component in the schizoid 
pattern indicated in Factor I. We shall see in 
a moment that the Pa scale disappears from 


more aS a neurotic normal 


Factor II in the NP group and has a loading 
in Factor I. 

The third normal 
group has its only significant loadings on Mf 
and Pa, reflecting the masculinity-femininity 
variable and its possible relationships to para- 


factor found in the 


noia— a relationship often stressed in psycho- 
analytic theory. 

The fourth factor has significant loading 
on D (.595) and Ma (—.595), indicating a 
bipolar relationship between these two scales. 
This 
mood independent of the schizoid and neu- 
rotic patterns reflected in Factors I and II. 


apparently indicates a dimension of 


Since scales D, F, and Pd have significant 
loadings on both Factors I and II, we may 
infer that the kind of reactions typified by 
these scales are parts of both the schizoid and 
neurotic pattern. 

NP Group. The analysis of the data for 
the NP group also indicates two major fac- 
tors. Factor I again has its maximal loadings 
on Pt (.936) and Sc (.925). Other signifi- 
cant positive loadings are found on Pd, F, Pa, 
Ma, Hs, D, and Mf, with a substantial nega- 
tive loadings for K of —.702. 


A comparison of these loadings with those 
found in Factor I for the normal group shows 
considerable similarity. For example, Pt, Sc, 
and K show very similar loadings. The dif- 
ferences which appear are primarily on two 
scales: (1) Hs scale has a lower loading on 
the “psychotic” factor for the NP group than 
it does for the group of college males, reflect- 
ing evidently the differences in defenses be- 
tween these two groups; (2) the other dif- 
ference concerns the loadings of Pa. In the 
NP group the Pa scale has its heaviest loading 
on the same factor as the “psychotic” scales, 
whereas in the college group its heaviest load- 
ing is on the same factor as the “neurotic” 
scales. 








140 


An examination of Factor 11—the “neu- 
rotic” factor—for the NP group indicates that 
it has its maximal loadings on only three scales: 
Hy (.845), Hs (.740), and D (.648). Com- 
paring these loadings with those obtained for 
the college group, we see that the Hs scale has 
a minimal loading (.345) for the normals, 
whereas in the NP group it has a loading of 
.740—the second highest loading for this fac- 
tor. It is interesting to note further that in 
the NP group the K scale has no loading on 
the neurotic factor, while in the normal group 
it has the second highest loading, indicating 
perhaps that in hospital patients the usual 
neurotic manifestations are accompanied by 
less of the type of “‘defensiveness” that K re- 
presents. It is well to remember, also, that the 
loadings of the “neurotic” triad for the hospi- 
tal group may have been determined by dif- 
ferent items of the same scales than the load- 
ings of the neurotic triad in the normal 
group. A future item analysis may indicate 
that the loadings of the “neurotic” triad for 
the hospital group represent something quite 
different than in the normal group, perhaps a 
second “psychotic” factor. One must recog- 
nize the danger of accepting the present scale 
titles for interpretation of factors without care- 
ful examination of the scale items contributing 
to the present loadings. 


The third factor obtained for the NP 
group has loadings on only two scales, Pa 
(.591) and F (.458). Pa is common to both 
the normal and the NP group, with F replac- 
ing Mf in the hospital group. 

A fourth factor was obtained for the NP 
group, but it disappeared in the process of 
rotation. 


Combined Group. The factors presented by 
combining the two groups are much as would 
be expected from the data obtained in the 
analysis of the two separate groups. The first 
factor again has its major scale loadings on 
Pt (.936) and Sc (.931), with the other 
values in general falling between those ob- 
tained for the separate groups. 

The Mf scale, however, shows some pe- 
culiar behavior. It will be noticed that in the 
combined group its communality for the first 
four factors drops to a value of .119. The 


W. M. WHEELER, K. B. LITTLE, AND G. F. J. LEHNER 


meaning of this drop is not clear. It may be 
that this reflects a curvilinear relationship be- 
tween Mf and some of the other scales in the 
more heterogeneous population. 


SUMMARY AND CONCLUSION 


The results from the present factor analysis 
of the MMPI seem to indicate that the am- 
bitious goal of measuring specific clinical 
syndromes has not been completely achieved. 
The test permits diagnosis mainly in terms of 
“neurotic” or “psychotic,” but not in terms of 
type of neurosis or psychosis or other more spe- 
cific category. The results do, however, substan- 
tiate the use of the MMPI for distinguish- 
ing between neurotic and psychotic syn- 
dromes. These two syndromes are here defined 
by their maximal loadings on the Hy and 
Sc scales, respectively. 

Present findings would indicate that refined 
differential diagnosis or the formulation of 
dynamic personality descriptions on the basis 
of MMPI profiles is a questionable pro- 
cedure. Present results are in accord with 
such as those reported by Schmidt [10], 
Gough [4], and Benton and Probst [1], who 
found that specific score profiles on the vari- 
ous scales do not permit differentiation among 
the patients in various psychiatric categories, 
though differentiation can be made between 
normal and abnormal persons. 


Received July 17, 1950. 


REFERENCES 


1. Benton, A. L., AND Prosst, KatHryn A. A 
comparison of psychiatric ratings with Minne- 
sota Multiphasic Personality Inventory scores. 
J. abnorm soc. Psychol., 1946, 41, 75-78. 


CottLe, W. C. A factorial study of the Multi- 

phasic, Strong, Kuder, and Bell Inventories 

using a population of adult males. Psychomet- 

rika, 1950, 15, 25-47. 

3. GenceRELU, J. A. A factorial method whose 
factors are empirical tests. Amer. Psychologist, 
1949, 4, 245-246. 

4. Goucn, G. Diagnostic patterns on the Minne- 

sota Multiphasic Personality Inventory. J. clin. 

Psychol., 1946, 2, 23-37. 

HaTHaAway, S. R., AnD McKintey, J. C. A Mul- 

tiphasic Personality Schedule: I. Construction 

of the schedule. J. Psychol., 1940, 10, 249-254. 


N 


“i 


“I 


THE INTERNAL STRUCTURE OF THE MMPI 


HaTHAway, S. R., anD McKinney, J. C. Man- 
ual for the Minnesota Multiphasic Personality 
Inventory. (Rev. Ed.) New York: Psychological 
Corporation, 1943. 

Howzserc, J. D., Ano Avesst, S. Reliability of 
the shortened Minnesota Multiphasic Personal- 
ity Inventory. J. consult. Psychol., 1949, 13, 
288-292. 

Meen., P. E. Profile analysis of the Minnesota 


10. 


141 


Multiphasic Personality Inventory in differenti- 
al diagnosis. J. appl. Psychol., 1946, 30, 517- 
524. 

ScuieLe, B. C., BAker, A. B., AND HATHAWAY, 
S. R. The Minnesota Multiphasic Personality 
Inventory. Lancet, 1943, 63, 292-297. 

Scumwt, H. O. Test profiles as a diagnostic 
aid: The Minnesota Multiphasic Inventory. J. 
appl. Psychol., 1945, 29, 115-131. 











AN INTERPRETIVE AID FOR THE Sc SCALE 


OF THE MMPI 


STANLEY J. BENARICK, GEORGE M. GUTHRIE, 
AND 
WILLIAM U. SNYDER 


THE PENNSYLVANIA STATE COLLEGE 


HE Minnesota Multiphasic Personal- 

ity Inventory (MMPI) was designed 

to measure with a single test several 
of the more important deviations of personal- 
ty. Of the nine clinical scales, probably the 
most difficult to interpret is the Sc (Schiz- 
ophrenia) scale. In their initial presentation, 
Hathaway and McKinley [3] report that this 
scale by itself was able to distinguish 60 per 
cent of observed cases diagnosed schiz- 
ophrenic. With this level of accuracy there 
were many nonschiz- 
Elsewhere 
Meehl and Hathaway [6] report that these 
nonpsychotics with high Sc scores, while not 


clinically schizoid, do manifest a pattern of 


ste °° ” 
false positives’ or 


ophrenics with high Sc scores. 


unusual mentation. This observation has been 
confirmed by the writers in the evaluation of 
college students seeking help at a campus 
clinic. While these high-Sc students are by no 
means psychotic, they do present a confusing 
diagnostic picture with unusual complaints 
and a discouraging response to therapy. ‘This 
last statement is particularly true in those 
cases with a low neurotic triad. 


Extensive experience with the test has indi- 
cated that the use of the total profile pat- 
tern, rather than of the scales taken separate- 
ly, is a more valid interpretive approach. 
Gough [1], Guthrie [2], and Schmidt [7] 
have emphasized that the schizophrenic pro- 
file characteristically has an elevated F score 
and a general positive s!ope. The profile of a 
neurotic with marked symptoms of anxiety 
differs from the psychotic profile in that F is 
lower and the neurotic triad is equal to or 
higher than the Pt (Psychasthenia) and Sc 
scales. However, even with this knowledge of 
the schizophrenic pattern, there still remain 


142 


a number of profiles obtained from nonpsy- 
chotic which are not readily dis- 
tinguished from those of psychotic patients. 


clients 


PROBLEM 

This diagnostic difficulty could be some- 
what alleviated if a scale for measuring schiz- 
ophrenia could be developed which did not 
show elevations for these nonpsychotic devi- 
ant personalities. Since it has been demon- 
strated that there are subsets of items in other 
scales, such as the Hy ( Hysteria) scale, it was 
felt that a similar subset might be found in 
the Sc scale which would account for these 
false positive and which would 
serve as an aid in identifying these doubtful 
profiles. 


elevations 


SOURCES OF DATA 

MMPI profiles were available from three 
sources: (1) 183 patients at a hospital for the 
criminally insane, (2) several hundred pro- 
files on file in a campus clinic and (3) over 
one thousand profiles collected by a specialist 
in internal medicine in a large midwestern 
city. 

Since this study was concerned with those 
profiles having elevated Sc scores, only those 
profiles were used which had a raw Sc score 
greater than 20. This is equivalent to a T- 
score of 64 on the norms which do not have 
a K correction. Four groups of profiles were 
used : 


1. Criterion Psychotic. Thirty profiles of male psy- 
chotics from a hospital for the criminally insane, 
with a mean raw Sc score of 25.9. 


2. Criterion Nonpsychotic. Thirty male college 
students with a mean raw Sc score of 25.4. The var- 
iability of these two groups is almost identical since 
these Sc scores were matched for elevation with the 
preceding psychotic group. 


THE SC SCALE OF THE MMPI l 


3. Cross-Validation Psychotic. Sixteen females and 
four males who were considered psychotic, though 
not hospitalized, by a specialist in internal medicine 
who was treating them for symptoms related to his 
specialty. This group had a mean raw Sc score of 
30.0. 


4. Cross-Validation Nonpsychotic. Four males and 
10 females not called psychotic by the internist, and 
five males and one female from the clinic files. The 
mean raw Sc score was 30.1. 


PROCEDURE AND RESULTS 


Contrasting the criterion psychotics with 
the criterion nonpsychotics, an analysis of the 
items of the Sc scale produced 11 items an- 
swered significantly more frequently by the 
criterion psychotics and 10 items more fre- 
quently answered in the schizophrenic di- 
rection by the nonpsychotics. The significance 
of the differences was computed using phi 
coeficients as outlined by Jurgensen [4]. 
These items are presented in Tables 1 and 2. 


TABLE 1 


ITEMS OF THE Sc SCALE ANSWERED SIGNIFICANTLY 
More FREQUENTLY BY THE CRITERION 
PsYCHOTICS 





Item 

Num- Critical 

ber Item Ratio* Ratio 

76 Most of the time I feel blue. 17/8 2.36 
(T)** 

121 I believe I am being plotted 8/0 3.06 
against. (T) 

157 I feel that I have often been 20/9 2.87 
punished without cause. (T) 

168 There is something wrong 12/5 1.98 
with my mind. (T) 

196 I like to visit places where I 6/1 2.06 
have never been before. (F) 

202 I believe I am acondemned 13/0 4.05 
person. (T) 

315 I am sure I get a raw deal 13/0 4.05 
from life. (T) 

331 If people had not had it in for 13/1 3.68 
me I would have been much 
more successful. (T) 

334 Peculiar odors come to me 18/6 3.17 
at times. (T) 

360 Almost every day something 7/1 2.30 
happens to frighten me. (T) 

364 People say insulting and vulgar 14/3 3.18 


things about me. (T) 





*Ratio of the number of criterion psychotics to the 
number of criterion nonpsychotics, both of whom an- 
swered the item in a significant or abnormal direction. 

**The T and F indicate the direction in which the de- 
viates tend to respond. 





= 
w 


TABLE 2 


ITEMS OF THE Sc SCALE ANSWERED SIGNIFICANTLY 
More FREQUENTLY BY THE CRITERION 
NONPSYCHOTICS 


Item 
Num- Critical 
ber Item Ratio* Ratio 
41 I have had periods of days, 13/27 3.86 
weeks, or months when I 
couldn’t take care of things 
because I couldn’t “get 
going.” (T)** 
97 At times I havea strong urge 8/20 3.11 
to do something harmful 
or shocking. (T) 
179 I am worried about sex 7/19 3.13 
matters. (T) 
238 I have periodsof such great 11/24 3.38 


restlessness that I cannot sit 
long in a chair. (T) 

259 I have difficulty in starting 12/22 2.58 
to do things. (T) 

301 Life is a strain for me much 
of the time. (T) 

320 Many of my dreams are 
about sex matters. (T) 

328 I find it hard to keep my 
mind on a task or job. (T) 


j 


10/20 2.64 


355 Sometimes I enjoy hurting 5/13. 2.20 
persons I love. (T) 
356 Ihave more trouble concen- 10/21 2.87 


trating than others seem to 
have. (T) 


*Ratio of the number of criterion psychotics to the 
number of criterion nonpsychotics, both of whom an- 
swered the item in a significant or abnormal direction. 

**The T and F indicate the direction in which the de- 
viates tend to respond. 


Using the 11 items of Table 1 as a scale. 
all 60 profiles were scored, with each item 
answered in the significant direction given a 
score of plus 1. The results are shown in Table 
3. The efficiency of separation is not at all sur- 
prising in view of the fact that this is the group 
upon which these items were selected. With 
this distribution maximum efficiency of sepa- 
ration was achieved by using a cutting score of 
2.5, so that those scoring 3 or more were called 
psychotic. 

On the cross-validation group, using the 
same cutting score, there were six false posi- 
tives and three false negatives, or a total of 31 
correct placements out of 40. This is an im- 
provement over chance significant beyond the 
1 per cent level. Further examination of Table 
3 indicates that a cutting score of 2.5 may be a 








144 


little low but that we may speak with consider- 
able confidence concerning those who score 4 or 


TABLE 3 


DIsTRIBUTION OF SCORES OF PsycHoTICs AND Non- 
PSYCHOTICS ON THE ELEVEN-ITEM SCALE 





"Group 1 Group 2 Group 3 Group 4 


Psy- Nonpsy- Psy- Nonpsy- 
Score chotics chotics chotics chotics 
11 
10 
9 2 
8 
7 3 4 
6 5 3 
5 7 3 
+ 4 6 1 
3 5 2 1 5 
2 2 10 2 7 
1 2 9 1 5 
0 9 2 
Total 30 30 20 20 


more on these 11 items. 

It should be borne in mind that this cross- 
validation is a severe test of the discriminative 
effectiveness of these 11 items since the cross- 
validation groups include patients of different 
sex, education, and socioeconomic status. It 
should also be noted that the cross-validation 
psychotics were not considered ill enough to 
warrant hospitalization. 


Since Sc scores are equated for psychotics 
and nonpsychotics in this study, Sc will not 
help to differentiate them as would ordinarily 
be the case. Kazan and Scheinberg [5] point 
out that F is usually validly elevated with psy- 
chotics. Using a cutting score so that all pa- 
tients having a raw F score of 12 or more were 
called psychotic, distributions of F were obtain- 
ed for both the criterion and cross-validation 
groups. The results showed 6 false positives 
and 11 false negatives for the criterion group 
and 6 false positives and 10 false negatives for 
the cross-validation group. These findings indi- 
cate that the F scale deviates in the expected 
direction but does not separate the groups as 


S. R. BENARICK, G. M. GUTHRIE, AND W. U. SNYDER 


effectively as do the 11 items presented in this 
study. 


SUMMARY AND CONCLUSIONS 


Working with the MMPI, clinicians have 
found a number of profiles of nonpsychotics 
showing abnormally elevated Sc scores similar 
to those characteristic of schizophrenic patients. 
The purpose of this study was to develop a 
method which would identify these persons. An 
analysis of the Sc items answered by a total of 
60 psychotic and nonpsychotic patients pro- 
duced 11 items answered significantly more fre- 
quently in the Sc-plus direction by the psy- 
chotics, and 10 items by the nonpsychotics. The 
11 items produced a highly satisfactory sepa- 
ration of a second group of 40 psychotics and 
nonpsychotics. In the use of these 11 items as 
an interpretive aid it must be kept in mind that 
they were derived and cross-validated on pro- 
files which showed Sc above 20 and neither a 
distinctly psychotic nor distinctly neurotic pro- 
file pattern. 


Received June 72, 1950. 
REFERENCES 


1. Goucn, H. G. Diagnostic patterns on the MMPI. 
J. clin. Psychol., 1946, 2, 23-37. 

2. Guturiz, G. M. Six MMPI diagnostic profile 
patterns. J. Psychol., 1950, 30, 317-323. 

R., AND McKintey, J. C. Manual 

for the Minnesota Multiphasic Personality In- 

ventory. (Rev. Ed.) New York: Psychological 

1945. 

4. Jurcensen, C. E. Table for determining phi co- 

efficients. Psychometrika, 1947, 12, 17-29. 

Kazan, A. T., AND SCHEINBERG, I. M. Clinical 

note on the significance of the validity score (F) 

in the MMPI. Amer. J. Psychiat, 1945, 102, 

181-183. 

6. Meent, P. E., ann HatHaway, S. R. The K 

factor as a suppressor variable in the MMPI. J. 

appl. Psychol., 1946, 30, 525-564. 

ScumwT, H. O. Test profiles as a diagnostic 

aid: The MMPI. J. appl. Psychol., 1945, 29, 

115-131. 


3. HaAtTHaAway, §S 


Corp., 


Ww 


“I 


THE MEASUREMENT OF INTELLECTUAL DECLINE 
IN THE SENILE PSYCHOSES’ 
JACK BOTWINICK anv JAMES E. BIRREN 


NATIONAL HEART INSTITUTE, NATIONAL INSTITUTES OF HEALTH 
BETHESDA, MARYLAND 


AND 


THE GERONTOLOGY SECTION, BALTIMORE CITY HOSPITALS 
BALTIMORE, MARYLAND 


‘HE purpose of this study was to deter- 
mine the validity of three indices in 
estimating intellectual deterioration in 

the elderly. These indices are: (a) the Deteri- 
oration Quotient, DQ, of the Wechsler-Belle- 
vue Scale [19], (b) the Efficiency Index, EI, 
of the Revised Examination for the Measure- 
ment of Efficiency of Mental Functioning [2, 
3], and (c) the Senescent Decline Formula, 
SDF, of Copple [5], based upon the Wechsler- 
Bellevue Scale. 

There is a distinct need for valid measures 
of intellectual functions in the elderly. These 
measures should not only give an estimate of 
level but estimate of the 
amount of decline from a previous maximum 
level [11]. Such estimates are needed to make 
judgments about the prognosis of elderly in- 
dividuals and accompanying guidance for re- 
tirement, occupation, recreation, and suscep- 
tibility to educative programs. For the insti- 
tutionalized patient there are questions about 
discharge or parole, diagnosis, and therapy 
that require measures of the type used in this 
study. 

A previous study [8] pointed to the lack of 
significant agreement between these indices 
which purport to measure intellectual deficit. 
That study did not indicate which of the mea- 
sures were valid but that the lack of agreement 
between them cautioned against their use with 
the elderly until further studies were made. 
The previous results emphasized the need for 
a criterion group of deteriorated patients in 


1The cooperation and the assistance of the staffs 
of the Springfield and Spring Grove Maryland State 
Hospitals and the Sheppard and Enoch Pratt Hos- 
pital in selecting patients, the assistance of Charlotte 
Fox in standardizing the techniques, and the compu- 
tational assistance of Betty Benser are gratefully ac- 
knowledged. 


present also an 


further studies of these measures. The general 
method of the present study was to compare 
the indices of a group of sixty-year-old deterio- 
rated patients with a control group. The se- 
lection of patients was from an institutional- 
ized group with diagnoses of senile psychosis or 
psychosis with cerebral arteriosclerosis. For the 
purpose of this study it seemed unnecessary to 
distinguish the two diagnostic groups of pa- 
tients. There is apparently considerable over- 
lapping of the characteristics of these diseases 
even though in some cases it may be possible to 
differentiate senile psychosis from psychosis 
with cerebral arteriosclerosis [17]. In addition, 
the measures of deterioration used in the study 
have never been advanced as capable of differ- 
entiating the etiology of the impairment. 

The results obtained on these patients were 
compared with those of the control group pre- 
viously studied [8]. It is recognized that these 
populations, the patient and control groups, are 
not mutually exclusive, and some overlapping 
is expected in their characteristics. These 
groups were matched for age, education, and 
socioeconomic background but differed to the 
extent that one group was permitted freedom 
of society and the other required institutional- 
ization in a mental hospital. Psychiatric opin- 
ion was that organic changes in these individu- 
al elderly persons were precipitating the behav- 
ior aberrations. The assumption is made, there- 
fore, that measures of deterioration should re- 
veal significant differences. The present study 
was thus designed to determine which of these 
measures gives best differentiation between the 
patient and control populations. 


PROCEDURE 
A total of 31 institutionalized patients diag- 
nosed as psychosis with cerebral arteriosclerosis 


145 








146 


or senile psychosis was studied. The general 
criteria of selection were identical to the previ- 
ous study [8]; they were white, aged 60 to 70 
years, with not less than 4 years of formal 
schooling, and were born in England or the 
United States (Table 1). All had hearing ade- 
quate to follow directions and were able to read 
typewritten material (primer-type, eight 
characters per inch) with corrected vision. 
The present patients were selected from the 
Spring Grove and Springfield Maryland State 
Hospitals and the Sheppard and Enoch Pratt 





JACK BOTWINICK AND JAMES E. BIRREN 


Hospital. In these three mental institutions 
with a combined population of approximately 
six thousand patients only 31 were selected as 
suitable for study. Only those patients were in- 
cluded who were regarded by both investigators 
as meeting the specific and general criteria of 
the study. Determinations were made after an 
individual interview with each patient, con- 
sultation with the responsible physician, and a 
review of the case history. The interview was 
employed to assay the patient’s ability to co- 
operate by having him read test material, copy 














TABLE 1 
CHARACTERISTICS OF PATIENT POPULATION 
Age 
Diag- Ad- Marital 

No. Sex nosis* Major Occupation Age Educ. mitted Status EIt SDt DQg_ IQ) 

ee SP Farming-housewife 68 7 68 M —9,1 37.6 $2 68 

a SP Stenog.-housewife 63 10 60 M —4+.8 47.8 108 84 

3 F PCA Housewife 62 7 61 S —9.9 37.1 64 83 

4 F PCA Seamstress-model 65 6 56 M —7.9 84.7 64 S+ 

; F SP Millinery buyer 65 10 59 M —6.0 85.3 91 83 

6 M PCA Cigar-maker 68 + 63 M —7.4 106.4 145 86 

7M PCA Pressman 68 7 63 S —7.2 72.2 84 98 

8 M PCA Whoesaler 67 6 64 M —7.8 69.9 82 92 

9M PCA Metal-worker 70 4 66 M —8.2 84.4 $2 99 
10 M SP Driver-watchman 61 5 58 M —7.4 69.7 116 70 
11 M PCA Salesman 68 7 55 M 45 $2.8 143 80 
12 M PCA Butcher 62 8 51 M —4+.7 123.0 116 93 
13 M SP Plumber 67 8 66 M —8.3 75.3 105 91 
14 M PCA Laborer 68 4 65 M —7.4 68.5 91 82 
15 M PCA R.R.-worker 67 8 60 M —5.6 89.6 111 86 
16 M SP Laborer 62 4 59 M —4.9 49.0 104 66 
17 F PCA Housewife 64 4 64 M —6.8 45.6 36 63 
ig F PCA Housewife 60 5 55 M —8.4 48.5 60 63 
19 F PCA _ Housekeeper 65 + 60 S —10.0 80.7 62 70 
20 M PCA Real estate, auto- 68 16 67 M —7.9 96.2 90 105 

mobile sales 
21 F PCA _ Housewife 64 6 64 M —7.2 47.8 38 81 
22 F PCA Housewife 69 7 53 M —5.7 49.8 236 69 
23 F PCA Housewife 63 5 62 M —8.1 52.4 106 69 
24 F PCA Store-manager, 66 12 59 M —§.2 99.1 162 80 
housewife 
25 F SP Housework 65 9 64 M —6.2 59.8 46 67 
26 M SP Chemist 68 17 68 M —7 2 59.9 99 103 
27 F SP Factory-worker 65 5 65 M —10.4 43.8 70 83 
28 M 7 Physician 60 16 60 M —4.8 88.6 116 94 
29 M a | Marine captain 67 10 59 M —5.0 73.7 116 113 
30 M 7 Insurance underwriter 64 16 64 M —5.2 105.0 102 126 
31 =F PCA Housekeeper 69 4 59 S * 49.2 52 76 
*§9 P — Senile psychosis; PCA. = Psychosis with cerebral arteriosclerosis. 


+Efficiency Index of the Babcock-Levy Examination of Efficiency of Mental Functioning [2, 3]. 

tSenescent Decline Index of Copple [5], derived from Wechsler-Bellevue Scale. 

§Deterioration Quotient of Wechsler, derived from Wechsler-Bellevue Scale [19]. 

Intelligence Quotient, full Wechsler-Bellevue Scale [19]. 

{The primary diagnosis of these patients was not established. The staff opinion was that organic age changes 
were influencing behavior. The cases were included in the study because of this opinion and the fact that the his- 
tories of these patients indicated behavior difficulties in recent years rather than throughout the lifetime. These fea- 
tures fitted the purpose of the selection criteria of this study which were designed to select individuals in whom 
behavior problems arose in later life which were influenced by or based upon presumptive organic factors. 

**This patient became il] during the test sessions without remission to a testable status. 


Pi1a& 


Laue 


INTELLECTUAL DECLINE IN SENILE PSYCHOSES 147 


numbers, and reply to questions about his age 
and occupation. A brief statement about the 
study was given to each patient to reassure him 
and to establish rapport for subsequent testing. 

The prototype of the selected patient was 
that of an adequate individual throughout his 
previous life. This was evaluated on the basis 
of his occupational, family, and educational 
background. Individuals with a history of men- 
tal deficiency or neurological disease and trau- 
matic brain damage, e.g., epilepsy, were exclud- 
ed as were individuals who had a history of 
previous mental illness or whose present status 
was dominated by functional symptoms, e.g., 
paranoia. The mean age at first admission to a 
mental hospital was 61 years (range 51 to 
68 years). The sample of 31 patients is biased 
to the extent that only patients having some 
measurable function by the tests were included, 
thus some were excluded who were bedridden 
or unable to talk or to follow directions. 

All subjects were tested individually and 
were given both the Wechsler-Bellevue and the 
Babcock scales. Indices of deterioration were 
computed as in the study on normal patients 
[8]. 

RESULTS 

Two of three indices (SDFand El) showed 
significant differences between the senile and 
control groups (Table 2). The other measure, 
the DQ, was not significantly different for the 
two groups. This latter index had no indicated 
validity in differentiating the elderly according 
to their mental status. 


Of the two indices which showed significant 
differences between the two groups, the mean 
difference was larger for the EI than the 


TABLE 2 


MEAN DETERIORATION INDICES OF THE SENILE 
PsyCHOTIC AND CONTROL GROUPS 








EI DQ SDF IQ 





Senile (N = 31) 
Mean —6.9* 95.1 70.4 84.1 
og 1.7 40.1 22.2 14.8 
Control (N = 50) 





Mean —5.3 91.8 $5.1 100.8 
a 1.6 20.1 21.5 11.4 
Mean difference 1.6 — 14.7 16.7 
t 4.1 2.9 5.3 
P <.01 _ <.01 <.01 
‘N = 30 


SDF. The mean EI was —6.9, o = 1.7, for 
the senile group and —5.3, o=1.6, for the con- 
trol group, a mean difference of about one 
standard deviation. In contrast the mean SDF 
was 70.4, o = 22.2, for the senile group and 
85.1, o = 21.5, for the control group, a mean 
difference of about 0.7 standard deviations. 
The magnitude of the differences as well as the 
nature of the distributions is seen in Figure 1. 
Each index is converted to a T-score, and the 
direction of the scores is changed so that a de- 
pression in any of the indices implies deteriora- 
tion or decline. 

It can be seen that the 1Qs of the two 
groups differ. This might be expected since the 
IQ is based upon a summation of subtest scores. 
In the scores of deteriorated individuals, the 
decline of subtest scores will summate to lower 
the total IQ. Thus any subtests which show a 
differential decline in the two groups will con- 
tribute to both an index of deterioration and to 
a lowering of IQ. 

Another aspect of the validity of the indices 
is the intertest correlations. The deterioration 
indices, DQ, EI, SDF, showed only low or 
zero intercorrelations (Table 3). In spite of 
the fact that the EI and SDF showed signifi- 
cant differences between the two populations 
the measures were not correlated (r = 0.39). 
Thus the conclusion must be drawn that these 
indices are measuring different aspects of in- 
tellectual decline and that the indices are not 
interchangeable in evaluating an elderly indi- 
vidual. The highest correlation obtained be- 
tween any of the measures was 0.54 between 
the IQ and SDF (Table 3). This relation can 
be explained in part by the fact that the same 
subtest scores are involved in both indices. 


TABLE 3 


Test INTERCORRELATIONS FOR THE SENILE GrouUP 











EI DQ  =SDF 

DQ —s1e° = _ 
SDF —.39° 35 _ 

1Q —.22° 3 54 





*N = 30, all others, N = 31. 


DISCUSSION 
The EI and SDF have some validity in dif- 
ferentiating the elderly but the low correlation 
between them leads to the conclusion that they 








148 


JACK BOTWINICK AND JAMES E. BIRREN 


























DIFFERENTIATION OF THE ELDERLY BY 
130}-— DETERIORATION INDICES aa 
— , e NORMAL N=50 — 
* SENILE PATIENTS Ne«3l 
—~ OR ond 
ae 
Vv) pou -—4 
er 90} a 
O ° ° 
Ls * , : 
70 _— 8 . . . ° onal 
3 ore 2 8 e 
i - costes of s, a ? 8 4 
a 85 3, 7 © oo88 . 
> oo ner GE wee ce cece aces . jepe SSedienc oncom on--- ? weer cececeee MOETe-- -- O-- oo _— 
Oo ©95 3 Sones nd be te °° : 
ve Se” Be adit i ime ue og 
5 = - rae "3 ip 
30 -— , 3 4 ee ; - F 4 
- : ER. : > a 
ai © § N S$ N S$ us= 
| | | | 
EI DQ SD IQ 
INDICES 


Fic. 1. Comparison of test results in normal (N) 
and senile psychotic (S) individuals 60-70 years of 
age. Individual scores of 31 patients with senile psy- 
choses and 50 control subjects are plotted as T- 
scores; control group mean — 50, @ — 10. The 
dashed line represents the mean T-score of the con- 


measure different aspects of intellectual dec- 
line. From the results of this study and previ- 
ous ones [1, 4, 7, 14, 15, 18] it is apparent 
that the DQ has little validity as a measure of 
intellectual deficit. The failure of the DQ to 
show any difference between the two groups is 
due in large part to the fact that the items in 
the “Hold” and “Don’t Hold” portions are 
not optimally selected. Results of several stu- 
dies [1, 12, 15] indicate that these subtests are 
not properly grouped to maximize decline. The 
SDF weights the subtest scores according to 
their correlation with age, and this computa- 
tion appears to be preferred to the DQ in view 
of the present findings. 


trol group. The indices are the Efficiency Index, EI, 
of the Babcock-Levy Scale, the Deterioration Quo- 
tient, DQ, and the Intelligence Quotient, IQ, of the 
Wechsler-Bellevue Scale, and the Senescent Decline 
Formula, SD, of Copple, based on the Wechsler- 
Bellevue Scale. 


The measure which best differentiated the 
two groups was the EI. This index contains a 
category called “initial learning,’ which is com- 
prised of three subtests. The difference between 
the two groups for this category was greater 
than the differences in any of the Wechsler- 
Bellevue subtests. Future attempts to refine 
measures of intellectual deficit might well ex- 
ploit the use of such learning subtests. 

The subtests which show the largest decline 
with age did not show the greatest differences 
in the two groups of this study. Thus the digit 
symbol subtest, which declines most with age, 
showed a smaller difference between the two 
groups than did the information subtest, which 


INTELLECTUAL DECLINE IN SENILE PSYCHOSES 


declines least with age. If deterioration in the 
senile psychoses were essentially a process of 
accelerated aging the subtests which show the 
largest age changes might be expected to show 
the largest differences between the two groups. 

Related to this problem is one of scatter 
analysis [13, 15, 21]. In the present study, the 
subtests of the Wechsler-Bellevue were com- 
pared for the two groups by an analysis of 
variance. It was found that there was greater 
variation in subtest scores in the control than 
in the senile group. It is felt, therefore, that 
scatter analysis of an elderly individual’s sub- 
test results should be approached with caution. 

Answers to questions regarding the general 
and specific nature of intellectual decline re- 
quire high reliability in the items or subtests, 
much higher reliability than now exists [6,10]. 
The reliability of the subtests is related to the 
difficulty of individual items. It is frequently 
impossible in the elderly to get a measurable 
performace on some of the Wechsler-Bellevue 
subtests. Future attempts to refine the tests 
should provide for inclusion of more items of 
low difficulty. 


SUMMARY 


1. The purpose of this study was to deter- 
mine the validity of three measures of intel- 
lectual deficit in the elderly. The indices of 
intellectual deficit of a deteriorated psychotic 
population were compared with those of a con- 
trol population previously studied. The deteri- 
oration indices used in this study were Deteri- 
oration Quotient, DQ, of the Wechsler-Belle- 
vue Scale, the Senescent Decline Formula, 
SDF, of Copple, based on the Wechsler-Belle- 
vue Scale, and the Efficiency Index, EI, of the 
Babcock-Levy Scale. 


2. A total of 31 patients aged 60 to 70 years 
diagnosed as senile psychosis or psychosis with 
cerebral arteriosclerosis was tested. Patients 
were selected who had been intellectually, soci- 
ally, and emotionally adequate but in later life 
required hospitalization due to behavior aber- 
rations associated with presumably organic 
changes. The mean age at first admission to a 
mental hospital was 61 years for this group 
(range 51 to 68 years). 

3. Significant differences between the pa- 
tients with senile psychoses and the control 


149 


group were noted for the EI and SDF. The 
DQ was found not to differentiate the groups. 
It is suggested that the DQ is an inadequate 
index of intellectual loss in the elderly. The 
SDF, also based upon the Bellevue Scale, is to 
be preferred to the DQ. 


4. Low or zero intercorrelations were found 
among these indices. The EI correlated so low 
with the SDF that these indices must be viewed 
as measuring different aspects of intellectual 
decline. The EI of the Babcock-Levy Scale in 
part derives its effectiveness in distinguishing 
the two groups of elderly individuals from the 
category of subtests called “initial learning.” 
Attempts to refine the measures should further 
develop such tests, as well as increase the sub- 
test reliability and add more easy items for use 
with the elderly. 


Received August 14, 1950. 
REFERENCES 


1. ALLen, R. M. The test performance of the brain 
injured. J. clin. Psychol., 1947, 3, 225-230. 

2. Bascock, Harriet. An experiment in the meas- 
urement of mental deterioration. Arch. Psy- 
chol., N. Y., 1930, 18, No. 117. 

3. Bascock, Harriet, AND Levy, Lypia. Test and 
manual of directions, the revised examination 
for the measurement of efficiency of mental 
functioning. Chicago: C. H. Stoelting Co., 1940. 

4. Boerum, ALice, AND SARASON, S. B. Does Wechs- 
ler’s formula distinguish intellectual deteriora- 
tion from mental deficiency? J. abnorm. soc. 
Psychol., 1947, 42, 356-358. 

5. Coppie, G. E. Senescent decline on the Wechs- 
ler-Bellevue Intelligence Scale. Unpublished 
Doctor’s dissertation, Univ. Pittsburgh, 1948. 

6. Derner, G. F., Aporn, M., ANp Canter, A. H. 
The reliability of the Wechsler-Bellevue sub- 
tests and scales. J. consult Psychol., 1950, 14, 
172-179. 

7. Diers, W. C., AND Brown, C. C. Psychometric 
patterns associated with multiple sclerosis. Arch. 
Neurol. Psychiat., Chicago, 1950, 63, 760-765. 

8. Fox, CHARLOTTE, AND Birren, J. E. Intellectual 
deterioration in the aged: Agreement between 
the Wechsler-Bellevue and the Babcock-Levy. 
J. consult. Psychol., 1950, 14, 305-310. 

9. GILBERT, JEANNE G. Mental efficiency in senes- 
cence. Arch. Psychol., N. Y., 1935, No. 188. 

10. HaAmusrer, R. The test-retest reliability of the 
Wechsler-Bellevue Intelligence Test (Form I) 
for a neuropsychiatric population. J. consult. 
Psychol., 1949, 13, 39-43. 

11. Huwt, J. McV., anno Corer, C. N. Psychologi- 
cal deficit. In J. McV. Hunt (Ed.), Personality 
and the behavior disorders. New York: Ronald 








150 


12. 


13, 


14. 


15, 


16. 


17. 


JACK BOTWINICK AND JAMES E. BIRREN 


Press, 1944. Pp. 971-1032. 


Hunt, W. L. The relative rates of decline of 
Wechsler-Bellevue “Hold” and “Don’t-Hold” 
Tests. J. consult. Psychol., 1949, 13, 440-443. 
JasTAK, J. Problems of psychometric scatter an- 
alysis. Psychol. Bull., 1949, 46, 177-197. 

Kass, W. Wechsler’s Mental Deterioration In- 
dex in the diagnoses of organic brain damage. 
Trans. Kansas Acad. Sci., 1949., 52, 66-70. 
Rapin, A. I. Psychometric trends in senility 
and psychoses of the senium. J. gen. Psychol., 
1945, 32, 149-162. 

Rasin, A. I. Vocabulary and efficiency levels as 
functions of age in the Babcock method. J. con- 
sult. Psychol., 1947, 11, 207-211. 

ROTHSCHILD, D. Senile psychoses and psychoses 


18. 


19. 


20. 


with cerebral arterioclerosis. In O. J. Kaplan 
(Ed.), Mental disorders in later life. Stanford, 
Calif.: Stanford Univ. Press, 1945. Pp. 233-279. 
SLOAN, W. Validity of Wechsler’s Deteriora- 
tion Quotient in high grade mental defectives. 
J. clin. Psychol., 1947, 3, 287-288. 

Wecnusier, D. The measurement of adult in- 
telligence. (3rd Ed.) Baltimore: Williams and 
Wilkins, 1944. 

WEISENBURG, T., Roz, ANNE, AND McBripz, 
KATHARINE E, Adult intelligence. New York: 
Commonwealth Fund, 1936. 

WirrTensorn, J. R. An evaluation of the use of 
Bellevue-Wechsler subtest scores as an aid in 
psychiatric diagnosis. J. consult. Psychol., 1949, 
13, 433-439. 


DIFFERENCES BETWEEN NEUROTICS AND SCHIZO- 
PHRENICS ON THE WECHSLER-BELLEVUE SCALE’ 


LAWRENCE 5S. ROGERS? 


VETERANS ADMINISTRATION MENTAL HYGIENE UNIT 


DENVER, COLORADO 


HERE have been a number of attempts 

to isolate patterns of subtest scores on 

the Wechsler-Bellevue which can be of 
diagnostic value. Fifteen such signs have been 
found in the literature which are supposed to 
be characteristic of schizophrenics or neurotics. 
It is the purpose of this study to investigate 
whether these indices differentiate between 
neurotics and schizophrenics. 


SUBJECTS 


The subjects in this study consisted of 183 
male World War II veterans, 100 of whom 
were diagnosed as neurotic and 83 as schiz- 
ophrenic. The neurotics were patients in Veter- 
ans Administration Mental Hygiene Clinics— 
50 from the Denver Clinic and 50 from an- 
other clinic, as reported by Kogan [2]. Kogan 
also reported on 50 schizophrenics who were 
treated either at a Mental Hygiene Clinic or in 
a hospital. To this group were added 33 more 
schizophrenics who were treated either at the 
Denver Veterans Administration Mental Hy- 
giene Clinic or at the Fort Logan Veterans 
Administration General Medical and Surgical 
Hospital. In every instance the patient’s diag- 
nosis was made by the treatment agency and 
reafirmed a previous diagnosis. No attempt was 
made to separate the groups into finer diag- 
nostic categories. 


The average IQ of the neurotics was 107.73 
(S$.D. 11.69) and of the schizophrenics, 104.16 
(S.D. 12.21). The average age of the neu- 
rotics was 26.93 years (S.D. 3.62) and of the 


‘Read at the Meeting of the Rocky Mountain 
Branch of the American Psychological Association at 
Fort Collins, Colorado, May 12, 1950. 

*Published with the permission of the Chief Medi- 
cal Director, Department of Medicine and Surgery, 
Veterans Administration, who assumes no responsi- 
bility for the opinions expressed or conclusions 
drawn by the author. 


151 


schizophrenics, 27.69 years (S.D. 5.18). It 
will be seen then that the two groups were 
comparable in both age and IQ, in central 
tendency as well as variability. 


PROCEDURE AND RESULTS 

The procedure consisted of comparing the 
criterion groups on each of the 15 signs found 
in the literature. No attempt was made to 
eliminate those subgroups which might serve to 
decrease the size of the difference between the 
groups. For example, Rapaport [3] points out 
that obsessive compulsives have the same type 
of scatter as the schizophrenics on the Picture 
Arrangement subtest. It was not feasible to re- 
move these subgroups. Hence, the results pre- 
sented may be considered minimal differences. 


The results are presented in Table 1. The 
specific statements tested are enumerated be- 
low. 


1. Rapaport [3, p. 128]: “A Comprehension score 
below the Vocabulary score characteristic of 
Schizophrenics (especially the Chronic and Deter- 
iorated cases) and Depressives (especially the Psy- 
chotic cases). Even the Preschizophrenics, who are 
in general well preserved, tend to have impaired 
judgment as estimated by this measure.” 

2. Rapaport [3, p. 160]: “We conclude that ex- 
treme Vocabulary Scatter of Similarities is found 
greatest in the Depressive Psychosis; the absence of 
such scatter in the Depressive Neurosis helps to es- 
tablish the differential diagnosis between these two 
groups. Such extreme scatter is also present, though 
to a lesser extent, in the Schizophrenics.” By extreme 
vocabulary scatter Rapaport means that the Similari- 
ties subtest score is 3 or more points lower than the 
Vocabulary score. 


18 


3. Rapaport [3, p. 175]: “Diagnostically, then, 
any superiority — especially a significant superior- 
ity — of the Digit Span score over the Arithmetic 
score should be taken as a schizoid or schizophrenic 
indication.” When significant superiority of Digit 
Span over Arithmetic was tested, that is, when the 
Digit Span score was three or more points higher 
than Arithmetic, we found but little difference be- 








152 LAWRENCE S. ROGERS 


TABLE 1 
SuBJECTS WITH VARIOUS PATTERNS OF SUBTEST SCORES 
(Scu1zopHRENIcs N——83, Neurotics N——100) 














. Comprehension below Vocabulary 

. Similarities 3 or more below Vocabulary 

. Digit Span higher than Arithmetic 

. Digit Span higher than Vocabulary 

. Arithmetic 2 or more below Vocabulary 

. Picture Arrangement below Vocabulary 

. Vocabulary greater than Picture Completion 

. Block Design 2 or more above Picture Completion 

. Digit Symbol below Vocabulary 

. Severe drops in weighted scores 

. Very low Similarities with high Vocabulary and 
Information 


Ke OW WON A Ui + WN 


— 


— 
N 


less than sum of Information and Block Design 
. Object Assembly 2 or more below Block Design 
. Sum of Information, Comprehension, and Block 
Design divided by sum of Digit Symbol, Object 
Assembly, and Similarities, greater than unity 


— 
> Ww 


. Sum of Picture Arrangement plus Comprehension is 


Level of 





Schizo- 

phrenics Neurotics x? Confidence 
N % N Ne 1 
41 49 19 19 19.64 1 
18 22 11 11 3.90 5 
30 36 32 32 — 50-70 
19 23 32 32 1.75* 20 
33 40 23 23 5.54 2 
50 60 39 39 9.57 1 
26 31 55 55 10.45 1 
26 31 16 16 3.73 5 
37 45 54 54 1.34 20-30 
a 5 1 1 oe 

4 5 1 1 aia 
50 60 42 42 6.07 2 
12 i5 28 28 4.95* 5 
47 57 64 64 94 30 


15. Sum of Picture Completion and Block Design greater 


than sum of Picture Arrangement and Object 
Assembly (neurotic sign) 





*Trend is opposite to that predicted by hypothesis tested. 


39 46 50 50 1.72 20 





**Did not appear often enough to warrant statistical treatment. 


tween the criterion groups. This occurred in 17 per 
cent of the neurotics and 22 per cent of the schizo- 
phrenics. 

4. Rapaport [3, p. 193]: “Digit Span score above 
the Vocabulary level and/or the Mean Verbal level 
is generally characteristic of nondeteriorated Schizo- 
phrenias among the psychoses and of schizoid ad- 
justment among the Normals.” It should be noted 
that we found a tendency in the opposite direction. 

5. Rapaport [3, p. 207]: “We conclude that a great 
drop in the Arithmetic score below the Vocabulary 
level was most frequent in Schizophrenics, especially 
Deteriorated Schizophrenics, and in Depressives.” 
By a great drop it was taken to mean that the 
weighted Arithmetic score was 2 or more points be- 
low that of the Vocabulary score. 


6. Rapaport [3, p. 226]: “We conclude that great 
negative Vocabulary Scatter on the Picture Ar- 
rangement subtest is most characteristic for the De- 
teriorated Schizophrenics and the Depressive Psy- 
chotics and to a lesser extent, for the Acute and 
Chronic Schizophrenics, and the Obsessive Compul- 
sives.” When extreme vocabulary scatter was tested, 
that is, when the Picture Arrangement score was 
4 or more points less than Vocabulary, we found 
19 per cent of our schizophrenics but only 5 per cent 
of the neurotics in this category. 


7. Rapaport [3, p. 248]: “Schizophrenics tend 


more than any other group to suffer especial im- 
pairment on this Subtest,” (Picture Completion). 
Rapaport also points out that low weighted scores 
on the Picture Completion test occur mostly in the 
schizophrenics. We found that weighted scores of 7 
or below appeared in only 3 per cent of the neurotics 
and in 16 per cent of the schizophrenics. 


8. Rapaport [3, p. 287]: “We conclude that it is 
most characteristic of the Schizophrenics and espe- 
cially of some of the Deteriorated Schizophrenics to 
have Block Design scores very much above Picture 
Completion scores. This may serve as a special di- 
agnostic scatter pattern of Schizophrenia.” Garfield 
[1] also reported he could not substantiate this find- 
ing of Rapaport’s since the statement was true on 34 
per cent of his schizophrenics and 28 per cent of the 
control subjects respectively. Our percentages are 32 
and 19 respectively. 

9. Rapaport [3, p. 295]: “A Digit Symbol score 
well below Vocabulary is most indicative of depres- 
sive trends retarding visual motor speed or of Schiz- 
ophrenic encroachment upon concentration.” 


10. Schafer [5, p. 64] says that among schizophre- 
nics there are severe drops in the weighted scores of 
some of the subtests. For the average range of in- 
telligence a drop of 6 points or more below the mean 
might be significant. If the IQ was 110 or more the 
drop should be 8 points or more to be signifi- 


3 ei eda 


NEUROTICS AND SCHIZOPHRENICS ON THE WECHSLER-BELLEVUE 153 


cant. This sign does not appear often enough to be 
treated statistically. 

11. Wechsler [6, p. 150]: “Very low Similarities 
with high Vocabulary and Information are definite- 
ly pathognomic for schizophrenia.” Using Wechs- 
ler’s criterion of 2.5 units above or below the mean 
subtest score to indicate marked variation we found 
that none of the cases fall into this category. Using 
a variation of 1.5 from the mean subtest score we 
found that a total of five cases fit; one of the neu- 
rotics and four of the schizophrenics. It appears 
that this sign does not occur often enough to be 
treated statistically. 

12. Wechsler [6, p. 150]: “In the schizophrenics 
the sum of Picture Arrangement plus Comprehension 
is less than the sum of Information and Block De- 
sign.” Garfield [1] also tested this ratio and found 
it to be true in 63 per cent of his schizophrenics as 
compared to 39 per cent of his controls. In percent- 
ages our figures are 60 and 42 respectively. 

13. Wechsler [6. p. 150]: “Object Assembly is 
much below Block Design for the Schizophrenics.” 
It should be noted that we found a distinct tendency 
in the opposite direction. Garfield [1] reports that 
37 per cent of his schizophrenics had Object Assem- 
bly lower than Block Design as compared to 28 per 
cent of his controls. In making a similar comparison 
we find that 28 per cent of our schizophrenics reveal 
such a tendency as compared to 43 per cent of the 
neurotics. 

14. Rabin’s [4] index consisted of the sum of In- 
formation, Comprehension, and Block Design, di- 
vided by the sum of the Digit Symbol, Object As- 
sembly, and Similarities. When this index is greater 
than 1.00 it is supposed to be indicative of schizo- 
phrenia. This sign was found in 57 per cent of the 
schizophrenics and 64 per cent of the neurotics. Gar- 
field [1] reports percentages of 58 and 67 respective- 
ly. 

15. Wechsler [6, p. 151] says—for neurotics: 
“The sum of Picture Completion plus Block Design 
generally is greater than the sum of Picture Ar- 
rangement and Object Assembly.” 


SUMMARY AND CONCLUSIONS 
The Wechsler-Bellevue records of 100 
veterans diagnosed as neurotics were compared 
with the Wechsler-Bellevue records of 83 
veterans diagnosed as schizophrenics, in an ef- 
fort to determine whether patterns of subtest 
scores could differentiate between the two 
groups. Schizophrenics, when compared with 
neurotics, gave the following differences : 
1. Significant at the 1 per cent level of confi- 
dence: 
Comprehension score below Vocabulary. 
Picture Arrangement score below Vocabu- 
lary. 
Picture Completion score equal to or be- 
low Vocabulary. 


2. Significant at the 2 per cent level of con- 
fidence: 
Arithmetic score 2 or more points below 
Vocabulary. 
Sum of Picture Arrangement plus Com- 
prehension less than Information and 


Block Design. 


3. Significant at the 5 per cent level of con- 
fidence : 
Similarities score 3 or more points below 
Vocabulary. 
Block Design 2 or more points than Pic- 
ture Completion. 
Object Assembly equal to or above Block 
Design. 
4. Not significant: 
Digit Span score greater than Arithmetic. 
Digit Span score greater than Vocabulary. 
Digit Symbol score below Vocabulary. 
Picture Completion plus Block Design 
lower than sum of Picture Arrange- 
ment and Object Assembly. 
Sum of Information plus Comprehension 
plus Block Design divided by sum of 
Digit Symbol plus Object Assembly 
plus Similarities greater than unity. 
5. Not found often enough to be treated sta- 
tistically: 
Very low Similarities with high Vocabu- 
lary and Information. 
Drops in the weighted scores of 6 or more 
below the mean weighted score. 
Received June 12, 1950. 


REFERENCES 


1. Garrievp, S. L., AND Fey, W. F. The Wechsler- 
Bellevue and Shipley-Hartford scales as measures 
of impairment. J. consult. Psychol., 1948, 12, 
259-264. 

2. Kocan, W. S. An investigation into the relation- 
ship between psychometric patterns and psychi- 
atric diagnosis. Unpublished Doctor's disserta- 
tion, Univ. of Pittsburgh, 1949. 

3. RApaport, D., Grit, M., ano Scuarer, R. Di- 
agnostic psychological testing. Chicago: Year 
Book Publishers, 1945, Vol. I. 

4. Rasin, A. I. Test-score patterns in schizophrenic 
and non-psychotic states. J. Psychol., 1941, 12, 
91-100. 

5. Scuarer, Roy. The clinical application of psy- 
chological tests. New York: International Univer- 
sities Press, 1948. 


6. Wecuster, D. The measurement of adult intelli- 
gence. (3rd Ed.) Baltimore: Williams and Wil- 
kins, 1944. 








CLASSICAL AND STANDARD SCORE IQ 
STANDARDIZATION OF THE LP.A.T. 
CULTURE-FREE INTELLIGENCE 
SCALE 2 


RAYMOND B. CATTELL 


UNIVERSITY OF ILLINOIS 


HE relative freedom from cultural in- 

fluence demonstrated for the culture- 

free, perceptual type of test does not ab- 
solve one from the usual cares in standardiza- 
tion. It is still necessary to attend to questions 
of stratified sampling, for subgroups differing 
in culture, e.g., social status groups, may also 
differ systematically in average intelligence, so 
that diverse groups must be balanced in the 
sample. The chief differences in outcome of 
standardization with this type of test according 
to the extensive experience of the British mili- 
tary psychologists [10] are that the curve of in- 
crease of score with age is likely to flatten out 
a bit earlier (14 years rather than 16 or 18 
years) and that the standard deviation of IQs 
is likely to be significantly larger. 

When Terman in 1916 first began his work 
with the Binet, 13 points of IQ were accepted 
as the standard deviation. This was soon raised 
to 15 [6]. Later work has accepted from 16 
(as in the Bellevue-Wechsler) to 18 or 19 
points. The recent survey of the total school 
population of Scotland at the 11-year level [7 | 
resulted in a Terman-Merrill standard devia- 
tion of 20 points of IQ. With the advance in 
test design and increasing disembarrassment of 
tests from scholastic, learned material, the rec- 
ognized standard deviation? of IQ has in- 
creased, a result which one would expect from 
earlier tests being more contaminated with 
scholastic achievement and from the well- 
known aim of class teaching to reduce the scat- 
ter of scholastic performance below the level 


1The sigma of group tests, apart from any fea- 
tures of design, is greater than for individual tests. 
Some of this is due to intrusion of personality char- 
acteristics (e.¢., giving up entirely when faced with 
difficulty) and some to reduction of real differ- 
ences in the individual test situation through the ex- 
aminer’s helping the dull more than the bright. 


154 


that it would attain as a result of the natural 
differences in intelligence in children. 


THE ADVANTAGES OF THE STANDARD 
SCORE IQ 

The present writer’s nonverbal Scale 1 test, 
published in 1930 [4], proved to have a larger 
standard deviation than any existing group test. 
A group of 3,734 ten-year-olds—the entire ten- 
year-old population of a city of about 200,000 
people and a rural district—yielded a SD of 24 
[1]. But this test still used pictorial forms and 
was not of the class of genuine, culture-free, 
perceptual tests to which the present test be- 
longs. The latter, as shown below, has a SD of 
about twenty-five, which is, perhaps, finally a 
close approach to the real SD of intelligence 
when not cloaked by scholastic effects. How- 
ever, the writer discovered in 20 years’ use of 
the Cattell Scale 1 nonverbal test that the 
average school teacher or guidance psychologist 
is not prepared to accept this SD. A generation 
of psychologists has become accustomed to 
thinking of an IQ of 130 as of a given degree of 
rarity, and if a new test shows children of such 
real ability to be more frequent than his pre- 
conceptions warrant, so much the worse for the 
test! (A fortiori this applies to the demand 
which the new test presents for the reconsider- 
ation of the mental defective IQ limit, setting 
it at 60 instead of 70!) For this and for other 
reasons aptly stated by Johnson [5], there is 
much to be said for the use of the standard 
score IQ, i.e.. for agreeing upon a SD of IQs 
and bringing all tests regardless of their actual 
spread to an agreed, conventional relation of 
percentiles to IQs. 

PROCEDURE IN THE PRESENT 
STANDARDIZATION 

It has accordingly been our purpose here to 

standardize the I.P.A.T. culture-free tests 


ee 


tf = PAYS ae ere Oe hr 


C6 ao MW ts & 


——. 
> 


je tnt 8 re 


STANDARDIZATION OF THE I.P.A.T. CULTURE-FREE TEST 155 


(Scales 1, 2, and 3) both in terms of the IQs 
literally obtained, i.e., what may be called a 
“classical” IQ standardization, and also in 
terms of the conventionally acceptable standard 
score 1Q. Although for most purposes the 
former will still be preferable, the two stand- 
ardizations can be presented to psychometrists 
to use according to their purpose and choice. 
This article describes the process for Scale 2, 
Forms A and B, which are designed for the 


Form A f al 



































RAW 


Form 8B 





1416 te TS Do aos 
, SCORE 



































1234 3 wheres 


RAW 





eight-year to adult-age range, i.e., to give maxi- 
mum discrimination between mental ages of 8 
and 14 years. 

As befits a culture-free test, the data for 
this standardization were gathered in two 
countries, Britain and the United States. 
Though the test is also published in a French 
edition, as yet insufficient data have accumu- 
lated from that source. The present basis of 
about three thousand cases must be regarded 


Ig 





4 14 





43 





















































25 26 27 28 29 30 3) 32 33 34 35 3637 38 39 40 4) 


hh 





















































od 24 25 26 27 28 29 303) 32 33 3436 36 37 3839 40 41 





SCORE 


Fic. 1, Frequency distribution of scores of the ten-year-old group. 








156 


as a contingent standardization reached at the 
end of the first year of the new test’s use and 
likely to be expanded in future years. 


TABLE 1 
MEAN SCORE AND SIGMAS FOR SUCCESSIVE 
Ace Groups 








Mean 
Age Mean Sigma Sigma 
Group Form Form Form’ Form for 
(Years) A B A B 4- N 
Year 
Interval 
6 10.2 12.0 6.98 6.78 50 
7 15.1 15.7 6.84 6.92 189 
6.86 
8 17.7 19.2 6.80 6.84 168 
9 21.8 22.2 6.86 6.92 241 
10 23.3 24.2 6.51 6.68 1399 
11 26.1 26.9 6.60 6.86 224 
6.38 
12 28.1 28.4 6.15 6.26 179 
13 29.2 29.5 5.98 5.85 177 
4 $12 31.4 5.62 5.17 203 
15 30.7 31.2 5.16 4.61 132 
5.03 
16 31.4 31.8 4.91 4.43 271 
760 SS 31.9 5.07 5.28 64 
Total 3297 





Although a culture-free test is relatively 
clear of general cultural and educational influ- 
ences [2, 3] it is no less subject than other tests 
to “test sophistication” [9, 10]. Indeed, the 
unexpected ways of thinking demanded are 
likely to produce a just-discernible practice ef- 
fect between the first and second exposures to 
the test. In many of the situations where cul- 
ture-free tests are used it is desirable to get rid 
both of test sophistication and of general cul- 
tural influuence. Fortunately the former can 
be avoided, where one is not sure how much 
test contact children have had, by giving Form 
A as a practice test before Form B. If there is 
no significant difference in the means of A and 
B one can assume that the children were already 
test-sophisticated, but if not, only the B form 
can be used against the saturated norms and 
the A form must be used against novitiate 








RAYMOND B. CATTELL 


norms running about one point of raw score 
lower. Accordingly it was our purpose to give 
both A and B forms to all subjects, the 4 form 
always being given before the B form. 

A substantial fraction of .the population, 
ranging from 7 to 17 years, tested to make the 
following tables, was taken either from two 
Midwestern university towns of population 
about twenty-five thousand each, or from a 
British industrial city of population about two- 
hundred-forty thousand. The means ran about 
three raw score points lower in the industrial 
city, and since this was demonstrably not due to 
any difference in test sophistication and was in 
the opposite direction to sampled norms of 
scholastic (cultural) attainment, it must be 
due to the relative absence of the normal fre- 
quencies of unskilled workers in the two 
Midwestern university towns. (Both were 
above average in E. L. Thorndike’s study [8].) 
The means, sigmas, and numbers at each year 
for the combined industrial, university, and 
other populations are shown in Table 1. As the 
numbers for the university towns—713—are 
much exceeded by those from other regions— 
2584 — the sample may be considered as well 
balanced as test populations usually are with 
respect to the geographical and other variables 
to be considered in a complete society. 


ANALYSIS OF THE DATA 


The mean and sigma for each year sample, 
as given in Table 1, are worked out as in the 
following illustration for the ten-year-old 
group. The ten-year-old group is taken to in- 
clude children of ages nine years six months 
through ten years five months. However, the 
simply calculated variance on such groups 
would be a combination of variance in intelli- 
gence and variance in age (within one year). 
All scores were therefore corrected to exactly 
ten years of age, by adding or subtracting, ac- 
cording to the age-score gradient of the pre- 
liminary drawing of Figure 2, adjusting for 
months younger or older than ten. The sigmas 
are from final measurements calculated on the 
distributions thus obtained for each year. In 
this ten-year-old group, for purposes of com- 
parison, the two “extreme’”’ types of population 
—the industrial city and the university town— 
are set out separately below. The results simi- 


larly obtained for all years are given in Table 
1. 


STANDARDIZATION OF THE I.P.A.T. CULTURE-FREE TEST 157 


These results are plotted in Figure 2, where 
it will be observed (a) that the curve flattens 
out a little earlier than with verbal or other 
culturally-involved tests, (b) that the B form 
runs consistently a little ahead of the A form— 
a mean lead, in fact, of about one score point 
at the lower level and of about one-half score 
point for adults, and (c) that the range for 
which the test is intended—8 years plus to 14 
years or adult levels — gives a slope nicely 
placed between the score of complete failure, 
namely, 9.2 points (the score obtainable by 
chance) and 46 points, the ceiling of the test.? 


These norms have been set out in “Ready 
Reckoner” form for reading IQs in the hand- 
book for the test [11]. It remains now to agree 
upon the basis of the second standardization— 
that using the standard score IQ. While the 
sigma of this standard score must be tolerably 
in conformity with that to which psychologists 
are accustomed (such a value would be about 
twenty points in recent discussions on tests), it 
must also not be too far from the true value 
for these tests. If the score sigmas at each of the 
above years are actually changed into IQ 
sigmas according to the agreed classical stand- 
ardization table (Figure 2), the mean for all 
years is between 25 and 26 points of IQ. It 
seems to the writer a reasonable compromise to 
set the standard score IQ distribution at a 


2It has long seemed probable that test-sophistica- 
tion effects due to test practice are less for bright or 
older than for dull or younger subjects, since the 
former “catch on” to instructions and novel require- 
ments straight away and lose their points only with 
respect to difficult items, not difficult instructions. 
Vernon and Parry [10] report this in their extensive 
World War II data. The present data from this 
point of view differ in two ways. First, we may 
compare the A to B gain of the youngest, with the 
middle and with the oldest groups. These figures 
(taken only for the groups personally tested under 
the writer’s supervision) are: for the 8- and 9-year- 
olds: 1.1; for the 10- and 11-year-olds: 0.7; for 12- 
and 13-year-olds: 0.5; for 14-, 15-, and 16-year-olds: 
0.5. Taking ten-year-olds alone and breaking them 
down into upper and lower quartiles and middle 
half, according to score level, we obtain a decreasing 
gain from tests A to B as follows: lower, 0.51 score 
points; middle, 0.86 score points; and upper, 1.19 
score points. The figures for mental age when asso- 
ciated with age thus support Vernon and Parry. But 
for mental age with age constant they point in the 
direction of more test improvement in the first retest 
for the bright. A possible reconciliation with other 
findings is that the improvement is greater but 
slower among the dull. 


sigma of 24 points of IQ. This is nearer to the 
present test sigma than to currently-accepted 
sigmas, but the assumption is made that the 
improvement of test construction will result in 
more widespread acceptance of the higher 
value, as it has done throughout the past 
twenty years. 

The distribution of raw scores, when plotted 
for other years, as shown above for the tenth 
year, shows a similar normal form, though with 
a very slight tendency for brighter 16 and 17- 
years-old samples to be skewed away from the 
ceiling of the test. With this foundation it is 
simple to work out the standard score IQ by 
equating 24 points of IQ to whatever sigma is 
found in raw scores, extending the relation 
throughout the curve. The mean sigma for 
years 6, 7, 8, and 9 is 6.86 score points ; for 10, 
11, 12, 13, 6.38 score points, and for 14, 15, 16, 
and 17, 5.03 score points. The decline in sigma 
is thus steepest around 12, 13, and 14 years, as 
one might expect if the IQ sigma is to remain 
constant with a flattening curve. These three 
sigmas are used at the three age levels, with 
some smoothing between, in the following 
Ready Reckoner for getting standard score IQs 
from raw scores and ages which is given in 
the Handbook (11). The corresponding Ready 
Reckoner for classical IQs based on Diagram 2 
is set out in Table 2. 


SUMMARY 

1.There is evidence that as intelligence tests 
have their scholastic contamination reduced, 
advancing to more culture-free forms, the 
standard deviation of IQs increases—from 
about twelve to about twenty-four points of 
IQ. 

2. Since psychometrists like to get accustomed 
to treating a certain IQ as if it had a certain 
rarity it may be desirable to provide with any 
intelligence test both a “classical,” ordinary 
standardization table and a standard score 
standardization in which IQs are reduced to 
some agreed sigma. 

3. These recommendations are carried out 
with respect to the I.P.A.T. culture-free in- 
telligence test, the validity and culture-freeness 
of which have been established by previously 
published research. As a compromise, somewhat 
nearer the “true” than the “traditional”’ value, 
a sigma of 24 was adopted for the standard 
score 1Q. 





‘00T JO [BULIOU B O} UOT}UIeI U! pesseIdxe seururexe ey} JO yueTjJOND souesy[0;U} 


oy} Ppunoy eq [[}4 Uorounf Jey} FY “pe}se} UOsied ey} JO aBv oy} Aq peyIvU MOI Oy} PUY ‘81008 94} BIBEeq YOIYA WUuMjoOo oy} INO Wig :SNOLLOAHIG 





SST EST Shl Ctl SEL HEL 6ZE FZE GIT HIT OTT SOT OOT $6 06 98 8 82 SL ZL 69 99 9 £9 19 “ZS ES ZS OS (3MpPe-E'st) +1 





SOT OOT SST OST HHT OFT LET ZEL SZI ZZI SIL EIT GOL OT 66 £6 68 £8 O8 LL tL IL 69 89 $9 19 85 9S +5 (S°E1-6°Z1) ET 


- 


LOT TOL OST CST SbE Itt «Bet EEL LZT EZE GIT SIL OTL 90T OOT 96 26 LB $8 O8 LL tL ZL OL 89 9 O09 BS 95 (S°ZI-€°21) Azt 
$4 &Z 12 £49 $9 9 8S (221-611) ZI 


™ 
™ 


S9L COT LST HST OFT ZHI 6£l PET 62E HCL OZE OIL ZIT SOL POL OO 96 16 88 +8 O08 
Tél LOL w2t OZE OIE ZIT SOL FOL OO $6 16 88 +8 18 82 LL bl OL $9 9 T9 (S°TT-£°1L) Atl 


< 69T £91 BST SST Lol Ebl ObT SET TS 
= OLE 9 GST LST SHE Hel «= IbT BEL HEL TEL LZT PZT OZI OTT ZIT SOT HOT OOT 96 16 L8 #8 7B O8 LL EL 89 99 49 (ZEI-60I) I 
5 TLE SOL O9T BST CST SHE «tbl Tht BEL HEL TET LZE HZE OZE OTT ZIT GOT SOT OOT $6 26 68 98 #8 18 94 IL OL L9 (8°OT-£°01) Hor 
CLE SOL HOT OT OSE CSE Bbl tht Ibl SEL HEL TEL LZE HZI OZT OIL HIT GOT SOT OO 96 £6 06 88 S8 O8 SL EL OL (Z'01-6'6) OF 
4 PLT GOT SOT TOT LST EST GbE OE Chl GEL SEL ZEL GZE LZ HZE IZE SIT HIT OLE SOL TOT 86 $6 £6 06 #8 642 LL SL (8°6-£°6) 76 
E OLT LOT E9T OFT OST EST GHE MPT THT GET SEL EET OFT LZT HZI OZE YIT TIT 901 COT OOT 86 6 68 +8 I8 82 (26-68) 6 
“4 OLE LOT = E9T OT OST EST GhT OPT EFT OT DET EET OFT LZT EZT SIT ELT GOT 9OT FOL OOT $6 88 98 £8 (8°8-£°8) “8 
OLE LOT HOT TOT GST OST EST OST OT EFT OPT DET ZEL SZI HZI OZE 9IT ZIT OIL 901 001 46 16 88 (Z8-6'2) 8 

OLT LOT H9T TOT GST OST EST OST LoL HoT Ib SET SET ZEL BZT HZ OTT LIT ELT 9OT OOT L6 6 (8°Z-€°L) AL 

fe th Ib Ob 6 BE ALE LE OE SE HE EE ZE TE OF 62 BZ LZ 9% SZ He EZ C IZ OF GT BE LI 9 q W307] 


the cb Ib Ob 6E B8E ft Of SE ME SE ZE TE OF 62 82 LZ 9% SZ we EZ 2 IZ OF GE BI LE OE SE Y Wioj 








NOLLYZIGUVANVLS OJ] ‘IVOISSVTD 


@ ATAVL 


Co 
vw 
es 





STANDARDIZATION OF THE I.P.A.T. CULTURE-FREE TEST 159 





ast 
40} 


35} 





390 


7 


ie Rance or Use > 





Cems or Test 











: 
Mad 
a 
25}+- g 
ve 
ul °o 
% 8 
S eof & 
” a 
z 
a iS}. 
B87 
10 fon 
_ 4. _— 
a Crance Score 
5 
+ t t + + + + + + + t + 
6 7 4 9 10 T 12 3 4 1s \6 i? 
Ace w YEARS 








Fic. 2. Curve of increase of raw score with age. 


4. The classical and standard score conver- 
sion tables are presented for an immediate 
population of 3,295 cases, a stratified sample 
from American and British populations—but 
these are also checked by reference to tests 
standardized on larger populations. 


Received June 12, 1950. 


REFERENCES 


1. Catrect, R. B. The fig’t fer our national in- 
telligence. London: P. S. King, 1937. 

2. CATTELL, R. B. A culture free intelligence test 
I. J. educ. Psychol., 1940, $1, 161-180. 

3. CATTeELL, R. B., Fetncorn, S. N., AND SARASON, 
S. B. The culture free intelligence test II. Eval- 
uation of cultural influence on test performance. 
J. educ. Psychol., 1941, $2, 81-100. 


4. 


“I 


11. 


CATTELL, R. B. A Guide to mental testing. (2nd 
Ed.) London: Univ. of London Press, 1948. 


Jounson, D. M. Applications of the standard 
score IQ to social statistics. J. soc. Psychol., 
1948, 27, 217-227. 

TerMAn, L. M. The intelligence of school chil- 
dren. Boston: Houghton Mifflin, 1919. 
Tuomson, G. H. The trend of Scottish intelli- 
gence. London: Univ. of London Press, 1949. 
THORNDIKE, E. L. Your city. New York: 
Harcourt, Brace, 1939. 

VERNON, P. E. Intelligence test sophistication. 
Brit. J. educ. Psychol., 1938, 8, 237-244. 


VerRNoN, P. E., AND Parry, J. B. Personnel 
selection in the British Forces. London: Univ. 
of London Press, 1949. 


A culture free intelligence test. Handbook for 
Scale 2, Forms A and B. Champaign, IL: 
Institute for Personality and Ability Testing. 








PERSONALITY INVENTORY DATA RELATED TO 


ACE SUBSCORES 


CAROL L. PEMBERTON 


UNIVERSITY OF CHICAGO 


HE purpose of this study was to see 
whether Munroe’s findings [4] with 
regard to the relative standing of female 
college students on the linguistic and quanta- 
tive sections of the ACE and Rorschach proto- 
cols would be corroborated by questionnaire 
data obtained from an adult male population. 

Munroe selected two groups of female col- 
lege students, one whose linguistic (L) scores 
on the ACE were markedly higher than their 
quantitative (Q) scores, and the other in 
which the reverse relationship held. She found 
that the higher-L girls had a more “subjective” 
orientation to life, whereas the higher-Q stu- 
dents were more apt to give a literal construc- 
tion to objective reality. 

Similar results have been reported by Him- 
melweit [3], who used the Raven Progressive 
Matrices and the Mill Hill Vocabulary Test. 
The group higher on the Matrices test contain- 
ed significantly more patients with a hysterical 
make-up and symptomatology, while those 
higher on Vocabulary were more anxious and 
depressed. 


SUBJECTS 
The subjects in the present investigation 


were 168 male executives employed by a large 
mail-order firm. 


PROCEDURE 


All subjects took the ACE (1943 College 
Edition), the Thurstone Temperament Sched- 
ule, the Guilford Inventory of Factors 
STDCR, the Guilford-Martin Inventory of 
Factors GAMIN, the Guilford-Martin Per- 
sonnel Inventory I, the Allport-Vernon Study 
of Values, and the Kuder Preference Record. 

A higher-L group was obtained by taking 
45 individuals whose standard scores (based on 


160 


national norms) on the linguistic section of the 
ACE were one sigma higher than their stand- 
ard scores on the quantitative section. Since all 
the subjects were members of a linguistic type 
of profession, it is not surprising that individu- 
als with higher Q than L scores were fewer in 
number. The higher-Q group consisted of 45 
people whose standard scores for Q were 
higher than their standard scores for L. 

Using these two groups, chi-squares were 
calculated for the 35 scores yielded by the vari- 
ous personality and interest questionnaires. 
The four-fold tables were obtained by splitting 
the questionnaire scores at the median for the 
total group of 168. 

Tetrachoric correlation coefficients were 
calculated between the various inventory scores 
and the L-scores, Q-scores, and total ACE 
scores, using the entire group of subjects. 


RESULTS 


Table 1 lists the questionnaire scores which 
differentiated between the higher-L and 
higher-Q groups, together with the chi-squares 
and levels of significance obtained. 


DISCUSSION 


The higher-L group appears to comprise in- 
dividuals who are “subjectively” oriented. 
They are more Socially Introverted (V),? 
Reflective (II), and more interested in Liter- 
ary (1), Theoretical (XII), and Esthetic 
(IV) pursuits than the higher-Q group. Intro- 
version, subjectivity, and reflectiveness, how- 
ever, should not be equated with neuroticism, 
as has been done by the authors of some per- 
sonality inventories; for the higher-L group 
also shows Lacks of Nervous Tenseness and 


‘Roman numerals refer to the items in Tabie I. 





. caile 


eS i 


PERSONALITY INVENTORY DATA AND ACE SUBSCORES 161 


TABLE 1 


Scores WHICH DIFFERENTIATE HicHeR-L 


AND HiGHEeR-Q Groups 








Level of 
Signifi- 
Rela- cance* 
tion- Chi- (Per 
Score Category Author ship Square Cent) 
I. Literary Kuder L 14.73 1 
Il. Reflective Thurstone L 8.75 1 
III. Economic Allport-Vernon Q 8.18 1 
IV. Esthetic Allport-Vernon L 6.15 2 
V. Social in- 
troversion Guilford L 5.50 2 
VI. Agreeable- 
ness Guilford-Martin Q 5.60 2 
VIL. Social Allport-Vernon L 4.98 5 
VIII. Mascu- 
linity of 
attitudes 
and in- 
terests Guilford-Martin L 4.78 5 
IX. General 
pressure 
for overt 
activity Guilford-Martin Q 4.65 5 
X. Lack of 
nervous 
tenseness 
and irri- 
tability Guilford-Martin L 4.62 5 
XI. Coopera- 
tiveness Guilford-Martin L 4.55 5 
XII. Theoretical Allport-Vernon L 4.05 5 
XIII. Persuasive Kuder Q 3.73 10 
XIV. Rhathymia Guilford Q 2.92 10 





*For 1 degree of freedom. 


Irritability (X) and professes greater Cooper- 
ativeness (XI). Guilford states that high 
scores in Cooperativeness indicate a “willing- 
ness to accept things and people as they are, 
and a generally tolerant attitude.”? Although 
not socially outgoing, the higher-L group is 
more interested in Social (VII) issues. The 
constellation of scores which differentiates the 
higher-L group points toward orientation in 
the self and “inner strength,” in Rorschach 
terminology, as the underlying unifying princi- 
ple. 


The higher-Q group seems to be more de- 
pendent on the external environment for 
primary motivation. They are Socially Extra- 
verted (V), express General Pressure for 
Overt Activity (IX), regard things in terms 
of Economic (III) value, and are more in- 
terested in Persuasive (XIII) occupations. 


2The Guilford-Martin Temperament Profile Chart. 
Sheridan Supply Co., Beverly Hills, California. 


They are higher on Guilford’s Agreeable (V1) 
factor. A review of the items contributing to 
this latter score indicates that the factor could 
better have been named “‘submission to authori- 
ty” or “social conformance.” The higher-Q 
group reacts in a lively and impulsive manner 
(Rhathymia [XIV ]), but if people are too dif- 
ferent from themselves they become intolerant 
(XI) and irritable (X). These traits are all 
descriptive of people who are highly stimulated 
by the environment, and are more dependent 
on the opinions of others and on external 
values. Further research might relate these 
findings to the experiments of Witkin [5], 
which differentiate individuals who are de- 
pendent on the visual field from those who 
utilize their body sensations for orientation. 

High Masculinity of Attitudes and Interests 
(VIII) as associated with higher-L scores ap- 
pears at first sight to be contradictory. It is pos- 
sible that this chi-square is one which reached 
the 5 per cent level of significance by chance. 
On the other hand, our group consisted entirely 
of male executives. If the higher-L group is 
more tolerant, less irritable, and more accept- 
ing of others, one might have predicted that 
they would be more accepting of themselves 
in their “masculine”’ role. 


It must be stressed that it is the relationship 
of L to Q scores, rather than the level of 
either, which is important in producing these 
results. The tetrachoric correlations between 
ACE subscores and the 35 inventory scores 
yield only two statistically significant co- 
efficients (above .38). These are L-score—Theo- 
retical (Allport-Vernon), and L-score—Liter- 


ary (Kuder), both .43. 


The present study corroborates Munroe’s 
findings. Had we, like Himmelweit, been deal- 
ing with a group of neurotics, it might be sur- 
mised that the higher-L group would tend to 
withdraw further from reality and develop 
anxiety or depressive symptoms, while the 
higher-Q group, with greater susceptibility to 
external stimulation, would be more likely to 
develop hysterical symptoms. 

These differences have been found with a 
coarse grouping of scores into Linguistic and 
Quantitative areas, compared with question- 
naire data. More fruitful information might 
be forthcoming if the scatter pattern of a test 








162 


with factorially pure subcategories and more 
reliable personality estimates, should be com- 
pared. Statistical methods for dealing with 
entire profiles are needed. Cattell [1] and 
Cronbach [2] have proposed techniques that 
may prove useful in this area. 


SUMMARY 


The ACE and several personality and in- 
terest inventories were given to 168 subjects. 
A group of 45 individuals with markedly 
higher L than Q scores was selected, and an- 
other group of 45 with higher Q than L 
scores. Chi-squares were calculated for the 35 
scores yielded by the inventories. The higher- 
L group was significantly more reflective and 
socially introverted, with higher literary, es- 
thetic, and theoretical interests. The higher- 
Q group was more extraverted, socially con- 
forming, interested in economic and practical 
matters, and interested in persuasive occu- 
pations. The latter group felt more general 
pressure for overt activity, expressed feelings 
of greater nervous tension and irritability, 





CAROL L. PEMBERTON 


and demonstrated less interest in social issues, 
greater lack of tolerance, and lower mascu- 
linity of attitudes and interests. 

The importance of studying profile patterns 
and the need for statistical tools making this 


possible are indicated. 


Received June 28, 1950. 
REFERENCES 
1. Carret., R. B. r, and other coefficients of pat- 


tern similarity. Psychometrika, 1949, 14, 279-298. 


BACH, he n 


tistical 


“Pattern tabulation”: A sta- 
method for limited pattern 
scores, with particular reference to the Rorschach 
test. Educ. psychol. Measmt., 1949, 9, 149-171. 


3. Himme.weir, H. T. The intelligence-vocabulary 


2 CRON 


analysis of 


ratio as a measure of temperament. J. Person- 
ality, 1945, 14, 93-105. 
4. Munroe, Ruts. Rorschach findings on college 


students showing different constellations of sub- 


scores on the ACE. J. consult. Psychol., 1946, 


10, 301-316. 
5S. Whirkin, H. A. 


of the 


Perception of bod 
position of the visual 
Monogr., 1949, 63, No. 302. 


position and 
field. Psychol. 


ee ee 


25) Bhorhr rent 


Bhd ede a haath 


THE VALIDITY OF THE HEWSON RATIOS 


JOHN I. WHEELER, JR., anv WALTER L. WILKINS 


ST. LOUIS UNIVERSITY 


HE problem of this investigation was 

to check the validity of the Hewson 

ratios [4], which seem, on preliminary 
inspection, to be the most promising of recent 
attempts to utilize psychometric data in diag- 
nosis. Recent reviews of the literature on 
scatter analysis [3, 5, 7, !0] differ in their 
estimates of the clinical usefulness of such 
techniques, but the Hewson proposals were so 
carefully worked out and sophisticated they 
deserve separate attention. 


Hewson devised her ratio method of diag- 
nosis in an effort to develop a better method 
of scatter analysis of the weighted subtest 
scores of the Wechsler-Bellevue Intelligence 
Scale than those previously proposed. Her 
original purpose was to demonstrate how the 
Wechsler could “be used to reflect the pres- 
ence of cerebral pathology in an adult.” 
While the stated purpose was merely to try to 
differentiate the patients with cerebral patholo- 
gy—the organics—from all others, the actual 
results found led her to conclusions somewhat 
beyond those planned. It was found that the 
applications of the ratios differentiated not 
only the organics from ali otis, but also 
gave supposedly important and reliable clues 
in the breakdown of the “all others” group 
into psychoneurotics and normals. 


The actual Hewson ratios are intersubtest 
comparisons using the weighted subtest scores 
of the Wechsler-Bellevue [11]. The quo- 
tient produced by each of eight ratios is placed 
in one of three diagnostic categories by 
comparing it to the critical cutoff scores, and 
a summary diagnostic judgment is finally 
made. Although Hewson has presented 13 
diagnostic ratios, only eight of them are used 
to arrive at the summary diagnosis. In the 


present study only these eight and two others 
are utilized. How valid are these when 
applied to weighted subtest scores? How 
valid are they when applied to a nosologi- 
cal group not included in the original Hewson 
report? Does this validity indicate that the 
technique should be used in the clinical situa- 
tion freely, with caution, or not be used be- 
cause of the possibility of the results being mis- 
leading? 


The ratios were applied to several samples 
of adequately diagnosed groups, totalling 236 
subjects, 84 being diagnosed neurotic, 113 
psychotic (functional), and 39 normal. Ade- 
quate diagnosis was partially insured by using 
only results of adult patients fully staffed at 
a St. Louis psychiatric clinic, the Child Center 
of Our Lady of Grace, during 1947 to 1950, 
in addition to published Wechsler-Bellevue 
results, which were: all psychotics diagnosed 
as schizophrenia, paranoid condition, and de- 
pression; all neurotics diagnosed as hysteria, 
depression, anxiety, and mixed neurosis; and 
all normals called well-adjusted patrol in 
Rapaport [12]; plus all cases in Burton and 
Harris [1], Schafer [9], and Kings County 
Bulletin [12], with full Wechsler reports 
and clear diagnoses fitting the nosological 
groups considered. No organic Wechsler test 
records were used. It was decided to attempt 
the checking of the validity of the ratios by 
using samples of patients from nosological 
groups other than the principal one with 
which Hewson worked. If the technique 
should place a sufficiently large percentage of 
subjects without demonstrable cerebral path- 
ology into such a classification, it would ques- 
tion the validity of the procedure for differ- 
ential diagnosis of organic brain involvement. 
If the ratios consistently give a summary 


163 








diagnostic judgment of no cerebral pathology 
or only infrequently classify patients from 
other diagnostic classifications as cerebral 
pathology, it would lend weight to the cre- 
dence which the clinician might give the 
Hewson proposals. 


TABLE 1 
PERCENTAGES OF THE SUMMARY Du AGNosTIC JUDG- 
MENT IN Neurotic, NORMAL, AND 
PsycHoTic SAMPLES 








% o 
% 
Normal (39) 
Neurotic 33.33 7.53 
Normal 46.15 7.98 


Cerebral Pathology 20.51 6.47 


Neurotic (84) 
Neurotic 44.05 5.42 
Normal 29.76 4.99 
Cerebral Pathology 26.19 4.80 


Psychotic (113) 
Neurotic 37.16 4.53 


Normal 18.59 3.66 
Cerebral Pathology 44.25 4.67 





Percentages for each of the three samples— 
neurotic, normal, and psychotic—are repro- 
duced in Table 1. It is interesting that ap- 
proximately 45 per cent of each group is cor- 
rectly placed, the remainder of the sample be- 
ing divided between the other two categories, 
with the psychotics placed in the cerebral 
pathology category. It appears, however, that 
this information is of little use to the cli- 
nician, since the chances are better than 99.73 
in 100 that the true percentage of each of 
these high percentages is not as great as 70 
per cent for any one of the three instances. A 
technique which identifies functional psy- 
chotics as neurotics 38 per cent of the time, or 
normals as neurotics a third of the time, 
should be used hesitantly by clinicians. 


An attempt was made to ascertain whether 
or not any particular one of Hewson’s ratios 
I through X was valid in correctly diagnos- 
ing the patients within a sample. Each of the 
ratios correctly diagnosed the normal subjects 
as normal at least 74 per cent of the time. Ap- 
parently, for normal subjects, the over-all judg- 
ment of the ratios is poorer than that of any 





164 JOHN I. WHEELER, JR., AND WALTER L. WILKINS 


one of the individual ratios from which it is 
induced. But it can hardly be said that any 
one of the ratios would be very useful in dif- 
ferential diagnosis since it was also found that 
even the subjects in the two abnormal samples 
were nondeviant (normal) more than _ half 
the time. 


Not one of the individual ratios correctly 
diagnosed the neurotic subjects more than 
one-quarter of the time. All of the ratios ap- 
plied to the neurotic sample turned out nor- 
mal more than 52 per cent of the time. 


Each of the ratios does a fairly good job of 
dividing the psychotic sample into three large 
groups: neurotic, normal, and cerebral path- 
ology, although the only category in which 
the ratios place more than 50 per cent of the 
functional psychotics is the normal classifi- 
cation, and this for each of the eight ratios. 

The chi-square test was utilized to ascer- 
tain whether any tendency toward correct 
diagnosis was present in the indications from 
the Hewson ratios, and suggests very strong- 
ly that such tendency does exist. Chi-square 
for the three samples—neurotic, normal, and 
psychotic—as classified into neurotic, normal, 
or cerebral pathology was 20.3916, with a 
probability of less than .001 for the 4 degrees 
of freedom. Neurotics, therefore, tend to be 
diagnosed neurotic, normals to be diagnosed 
normal, and functional psychotics to be diag- 
nosed as cerebral pathology. 


The samples were then combined into 
groups which would allow for testing of the 
significance of each of the diagnostic catego- 
ries against its comparable sample—or in the 
last instance against the sample which it tend- 
ed to include under its heading. The neurotic 
and nonneurotic (normal plus cerebral path- 
ology) summary diagnostic judgments of the 
ratios were tested with the neurotic and non- 
neurotic (normal and psychotic) samples, giv- 
ing a chi-square of 1.2391, with a probability 
of between .20 and .30 for the 1 degree of 
freedom, suggesting no definite tendencies. 
When the neurotic and cerebral pathology 
diagnostic indications were combined into one 
nonnormal category, and the neurotic and 
psychotic samples were combined into one 
nonnormal sample, and these compared with 





MRA IE oy Panes ES PIS + 





oe Dla? vas els coe 


eae Neen ee, eee eS 


VALIDITY OF THE HEWSON RATIOS 165 


the normal classification of the normal sam- 
ple, a chi-square of 9.9836 was obtained, with 
a probability of just slightly greater than one 
in a thousand. This indicates a significant 
tendency for the ratios to classify the normal 
sample as normal and the nonnormal sample 
as nonnormal. ‘Testing of the cerebral patho- 
logy diagnosis for the psychotic sample in a 
comparable way gave a chi-square of 10.9168, 
with probability of less than .001, suggesting 
that the Hewson ratios have a significant 
psychotics as cerebral 
Determination of the significance 
of the differences between the percentages ob- 
served is indicated in Table 2, using the 
standard error for the difference between per- 
centages computed by McNemar’s formula 
27a [6]. 


tendency to classify 


pathology. 


TABLE 2 
SIGNIFICANCE OF DIFFERENCES BETWEEN CORRECT 
AND INCORRECT DIAGNOSES 








Ne t 
Normal Sample 
Normal Hewson Diagnosis 46.15 
1.085 
Nonnormal Hewson Diagnosis 53.84 
Neurotic Sample 
Neurotic Hewson Diagnosis 44.05 
3.00** 
Nonneurotic Hewson Diagnosis 55.95 
Psychotic Sample 
Cerebral Pathology Hewson 
Diagnosis 44.25 
$.22%° 
Non-Cerebral Pathology 
Hewson Diagnosis 55.75 





**Significant at .01 level. _ 


The results of the present investigation 
show that for the samples here used the Hew- 
son summary diagnostic judgment gave incor- 
rect diagnostic indications more frequently 
than it gave correct ones. The technique is 
therefore regarded as being likely to mislead 
clinicians. Statistical tests applied to the re- 
sults show a significant tendency for the 
ratios to call normals normal and to call func- 
tional psychotics organic, while the tendency 
for the ratios to identify neurotics as neu- 
rotic is not significant. Although the normal 
and psychotic (as cerebral pathology) indi- 
cations are definitely group trends not attrib- 
utable to chance alone, they must be considered 


as of limited clinical use to the psycholo- 
gist attempting differential diagnosis from 
That different nosological 
groups differ in their abilities on the Wechsler 
is certainly true, but the clues within the test 


subtest. scores. 


which clinicians use are still necessary. Prob- 
ably the test on which the technique is based, 
the Wechsler-Bellevue, is in need of further re- 
finement as a diagnostic instrument, as Cron- 
bach has suggested [2]. 


CONCLUSIONS 
1. The Hewson ratio method of differential 
diagnosis, while one of the most sophisticated 
methods yet devised, is not valid enough for 
clinical diagnosis of neurosis, functional psy- 
chosis, or normalcy. 


2. Because the technique as yet makes no 


provision for functional psychosis, especially 
schizophrenia, it cannot help but err when ap- 
plied to such cases. 

3. Since the method classifies large numbers 
of subjects without organic brain disease as 
cerebral pathology, it should be applied to 
such cases only with considerable caution. 

4. Significant tendencies for differential 
diagnosis of some groups are present, but the 
capacity of the method to diagnose any indi- 


vidual case by itself is nearly nonexistent. 


Received December 12, 1950. 


Early publication. 


REFERENCES 


1. Burton, A. J., AND Harris, R. E. (Eds.) Case 
histories in clinical and abnormal psychology. 
New York: Harper, 1947. 

2. CronpacH, L. J. Essentials of psychological 
testing. New York: Harper, 1949. 

3. Harris, A. J., AND SHAKOow, D. Scatter on the 
Stanford-Binet in schizophrenics, normals and 
delinquent adults. J. abnorm. soc. Psyckol., 
1938, 33, 100-111. 

4. Hewson, Louise R. The Wechsler-Bellevue 
Scale and the Substitution Test as aids in neu- 
ropsychiatric diagnosis. J. nerv. ment. Dis., 
1949, 109, 158-183, 246-266. 

5. MayMAn, M. Review of the literature on 
“scatter.” In D. Rapaport et al., Diagnostic 
psychcolegical testing. Chicago: Year Book 
Publishers, 1949. Vol. I. 





166 JOHN I. WHEELER, JR., AND WALTER L. WILKINS 


6. McNemar, Q. Psychological statistics. New 10. Watson, R. I. The use of the Wechsler-Belle- 
York: Wiley, 1949. vue: A supplement. Psychol. Bull., 1946, 43, 


61-68. 
Rasin, A. I. The use of the Wechsler-Bellevue . Wecnusrer, D. The measurement of adult in- 
ccalee wah sormel end stacrmel persons. telligence. (3rd Ed.) Baltimore: Williams & 
Psychol. Bull., 1945, 42, 410-422. Wilkins, 1944. 
Rapaport, D., Git, M., AND SCHAFER. R. Diag- 
nostic psychological testing. Chicago: Year 
Book Publishers, 1946. Vol. I. 


Case reports in clinical psychology. I. Depart- 
ment of Psychology, Division of Psychiatry, 
Kings County Hospital, Brooklyn, New York, 


ScuaFer, R. The clinical application of psy- —_ 


chological tests. New York: International Uni- 
versities Press, 1948. 





ee Oe, ee 


TT 
NEW BOOKS 


\\ 
AND 


ee fg 


Books 


ALLport, Gorpon W. The nature of personality: 
selected papers. Cambridge, Mass.: Addison- 
Wesley Press, 1950. Pp. vii + 220. $2.50. 


This collection of Allport’s papers includes his 
superb chapter on attitudes from the Handbook of 
Social Psychology (1935) and ten articles that ap- 
peared in journals from 1937 to 1947. All have 
been out of print or unavailable, and comprise a 
welcome resource for the instruction of advanced 
students and for psychologists’ private libraries. 
The articles center on the development of a per- 
sonalistic psychology, on the ego in psychology, and 
on the functional autonomy of motives. There is a 
bibliography of Allport’s other writings, complete 
to 1950. 


BLEULEeR, EuGen. Dementia praecox or the group of 
schizophrenias. (Trans. by Joseph Zinkin.) New 
York: International Universities Press, 1950. Pp. 
xii + 548. $7.50. 

Bleuler’s monograph on schizophrenia which de 
fined that concept in 1911 has only now become 
available in an English translation. It reveals Bleuler 
as worthy of his reputation—a keen observer, a vivid 
describer of symptons and personalities, a sharp 
critic of the unproved, and an original synthesizer of 
theory. To Bleuler, the primary phenomenon of 
schizophrenia was a disturbance of associations, a 
“loosening” of the thinking processes. All other 
symptoms were regarded as secondary, arising from 
the patient’s “desires, wishes, and fears which, be- 
cause of the disturbances of associations, are often 
distorted to the point of being unrecognizable.” He 
thus recognizes the dynamic origin of delusions and 
other individual symptoms. Even after forty years, 
Bleuler seems antiquated in only a few places. 
Perhaps he was in advance of his time or perhaps 
our understanding of schizophrenia has not pro- 
gressed markedly since his work. Or indeed, it may 
be both of these. 


Brosse, THERESE. War-handicapped children. Paris: 
UNESCO Publication No. 439, 1950. (New York: 
Columbia Univ. Press.) Pp. 142. 50¢. 


Brosse, THerese. Homeless children. Report of the 





NOTE: Some reviews in this issue were prepared 
by the Associate Editors, who may be identified by 
their initials. Unsigned reviews are by the Editor.— 
L. F. S. 


cESTS 


proceedings of the Conference of Directors of 

Children’s Communities, Trogen, Switzerland. 

Paris: UNESCO Publication No. 573, 1950. (New 

York: Columbia Univ. Press.) Pp. 76. 50¢. 

These two UNESCO reports give a vivid picture 
of the scope and intensity of problems of the dis- 
placed, orphaned and deprived children whose lives 
were profoundly altered by World War II and its 
sequels. From a practical viewpoint, the booklets 
reveal a large group of children needing educational 
and therapeutic help whose condition is too little 
known to American psychologists. From a theoretical 
view, they provide fascinating, though hardly ade- 
quate, glimpses of ways in which personalities are 
molded by catastrophe, and of the ways in which 
some children readjusted and survived. 


CANTRIL, HaApLey (Ed.), AND STRUNK, MILDRED. 
Public opinion 1935-1946. Princeton, N. J.: Prince- 
ton Univ. Press, 1951. Pp. lix + 1191. $25.00. 
This huge reference volume contains the results of 

almost every public opinion poll conducted by 23 
organizations in 16 countries between 1935 and 1946, 
Information given for each survey includes the 
country or area surveyed, the date, the question, and 
the percentage results, which are often broken down 
for various categories within the sample. Except for 
a general statement in the Introduction concerning 
the sizes and constitutions of the samples typically 
used by the several agencies, no detailed data are 
given about the populations surveyed for each ques- 
tion. The findings are arranged in alphabetical order 
by main categories from “Absenteeism” to “Worry,” 
and there is a 45-page cross index. 


CATTELL, RAYMOND B. An introduction to personality 
study. London: Hutchinson’s University Library, 
and New York: Longmans, Green, 1950. Pp. 235. 
Text edition $1.60, trade edition $2.00. 


This small book is an able summary of Cattell’s 
theories and researches in the field of personality. 
His theoretical position is a combination, blended 
with considerable harmony, of McDougall’s instinct 
theory, Freud’s dynamics, stimulus-response psycholo- 
gy, and factor analysis. 

Crawrorp, Paut L., MALAMup, Daniet IL, Anp 
Dumpson, JAMES R. Working with teen-age gangs. 
New York: Welfare Council of New York City, 
1950. Pp. xi + 165. $2.75. 


A stimulating description of a three-year project 
in which a group of workers gained the confidence 


167 





of four Central Harlem street gangs and worked 
with the boys on their own terms. The monograph 
describes the gangs, their organization and activities, 
how the workers established contact with them, and 
how they encouraged constructive self-direction. 
Although many of the approaches were influenced 
by the circumstances under which the boys lived, 
the fundamental methods are widely applicable: 
accepting the boys as persons, and having real con- 
fidence in their ability to solve their own problems. 
The outcomes of the project are sketched, but a more 
detailed evaluative report is in preparation. 


Dennis, Wayne. (Ed.) Current trends in the rela- 
tion of psychology to medicine. Pittsburgh: Univ. 
of Pittsburgh Press, 1950. Pp. 189. $3.75. 

The eight lectures of 1950 in the University of 
Pittsburgh’s series on current trends in psychology 
reveal wider relationships between psychology and 
medicine than are commonly included in the concept 
of clinical psychology. There is emphasis on the role 
of the psychologist in public health, in medical 
education, and in collaborative research with medi- 
cine. Eysenck’s chapter on the relation between psy- 
chology and medicine in England is especially 
stimulating. 


DotiarD, JoHN, AND Mutter, Neat E. Personality 
and psychotherapy. New York: McGraw-Hill, 
1950. Pp. xii + 488. $5.00. 

This stimulating book makes the praiseworthy 
attempt to bring together psychoanalysis, learning 
theory, and social anthropology, and from all of 
them to weld hypotheses about personality and about 
what happens in the therapeutic process. To do this, 
the authors have had to select their evidence with 
some care, but they are aware of the pitfalls of their 
task. Their analysis of therapy refers only to 
Freudian psychoanalysis, and although they dis- 
claim any attempt to teach therapy in this book, it 
is replete with specific directions as to what the 
therapist should and should not do. They emphasize 
strongly the role of verbalization, both explicitly 
and implicitly, and ignore recent evidence that 
therapeutic changes can take place in the absence of 
anything that could be called insight. The book con- 
cludes with two chapters on application to normal 
learning, the first a discussion of self-study, and the 
second on how to suppress troublesome thoughts to 
get freedom for creative thinking.—A. R. 


Eisster, Rutu S., et al. (Eds.) The psychoanalytic 
study of the child. (Vol. V.) New York: Inter- 
national Universities Press, 1950. Pp. 410. $7.50. 
Volume V of The psychoanalytic study of the 

child contains twenty-one papers, six of which were 

read at a symposium on psychoanalysis and develop- 
mental psychology, at the 1950 meeting of the 

American Psychoanalytic Association in Detroit. The 

keynote paper by Heinz Hartmann, on psychoanalysis 

and developmenta! psychology, is interesting because 
it hints that research in analytic therapy, and in the 
prevention of personality disturbances, is important. 





168 NEW BOOKS AND TESTS 


Ernst Kris, in another article, elaborates on the need 
for combining the psychoanalytic approach with the 
observational, while Rene Spitz writes further on 
research methodology. Perhaps a new era in analysis 
is about to dawn! Many of the other papers, how- 
ever, still consist of fragments of. one or two cases 
used to build up elaborate theories. Authors of other 
articles of special interest are: K. R. Eissler, who 
tries to explain why the treatment of delinquents is 
resisted by analysts more than the treatment of 
schizophrenics, although the former should be much 
easier to treat; David Beres and S. J. Obers, who 
report on a nonanalytic follow-up of 38 children with 
“extreme deprivation in infancy,” and who found 
that numerous previous studies were not justified in 
their conclusions that such children are doomed to 
severe maladjustment in later life; Bruno Bettelheim 
and Emmy Sylvester, who present an interesting dis- 
cussion of delinquency and morality; and Leo 
Rangell, who reports on the successful treatment of 
nightmares in a seven-year-old boy, directed by the 
analyst through correspondence, and conducted by 
the father. More of the therapeutic processes are 
revealed in this exchange of correspondence than in 
the detailed descriptions in some of the other articles. 
As was the case in the other volumes of The Psy- 
choanalytic Study of the Child, the various articles 
are uneven, but several of them will probably be- 
come bases for much discussion and research.—M_.K. 


GULLIKSEN, HAROLD. Theory of mental tests. New 

York: Wiley, 1950. Pp. xix + 486. $6.00. 

This book treats the mathematical theory and 
statistical methods used interpreting test results. 
Although it is designed for readers with some knowl- 
edge of algebra, analytical geometry, and statistics, 
all of the major formulas from these disciplines as- 
sumed by the book are given in the appendix, and 
in his preface the author indicates what the reader 
without this background may omit. The volume 
begins with the theory dealing with the accuracy of 
test scores, and it proceeds to the theory underlying 
reliability and parallel tests. Various protocol prob- 
lems such as those related to the construction of 
parallel tests, criteria for parallel tests and methods 
for determining reliability, follow the discussion of 
reliability and validity. The later chapters in the 
book include methods of scoring, scaling, equating 
tests, and problems connected with batteries of tests 
and item selection. Professor Gulliksen has gathered 
into one volume from the periodical literature (the 
bibliography has 23 pages) the assumptions and the 
derivations upon which the formulas are based. The 
style is simple and the presentation is clear. This is 
indeed a contribution to measurement in psycholo- 
gy.—F. McK. 


HELLERSBERG, EvisapetTu F. The individual's relation 
to reality in our culture. Springfield, Ill.: Charles 
C Thomas, 1950. Pp. x + 128. $3.25. 


This monograph is essentially a manual for the 
Horn-Hellersberg Test. Although the ten-page Ap- 





lM a 2 


scones ane 





4 
4 
5 
4 
» 
a 











.—lo-: - ae Oe el 


=m & 


BOB ero fF 


rl- 

cs, 
1s- 
nd 
ler 
me 


ng 
»b- 
of 
ds 

of 
the 
ing 
ests 
red 
the 
the 
The 
3; is 
ylo- 


ion 
rles 


the 
Ap- 





re 


ies) tle 


. 





NEW BOOKS AND TESTS 169 


pendix is entitled “Instructions for administration 
and interpretation,” it is in itself a very inadequate 
manual, so it is fortunate that the body of the text 
is devoted largely to further instructions and to dis- 
cussions of a series of illustrative cases with whom 
the instrument has been used. The Horn-Hellersberg 
Test is a projective device in which the subject is in- 
structed to finish incomplete line drawings. This task 
is followed by an interview inquiry structured by 
the examiner to provide the necessary information 
for scoring and interpreting the subject’s test behav- 
ior. Although responses are scored on many deter- 
minants—sequence, movement originality, etc.—the 
author regards the test as primarily useful in evalu- 
ating a subject’s relation to reality, and has derived 
a method of content scoring to provide an index of 
reality adjustment regardless of the subject’s cultural 
background. It would appear to be a useful tech- 
nique. The author states that she has used the test 
with 2500 cases and has “found that if 36 per cent 
of all drawn items are placed in the Objective Zone, 
the individual is still able to function normally as an 
adult in our civilization.” Unfortunately, however, 
no normative data are provided, and there is no 
indication as to what may be expected by way of 
test-retest reliability or interexaminer agreement in 
scoring and interpretation —E. L. K. 


Herma, HANns, AND Kurtu, Gertrup M. (Eds.) 
Elements of psychoanalysis. Cleveland and New 
York: World Publishing Co., 1950. Pp. xi + 333. 
$3.00. 


A well-selected “psychoanalytic reader” for intelli- 
gent laymen, containing twenty-one chapters by as 
many authors. Most chapters are taken from previ- 
ous books and include contributions by many of the 
best names in psychoanalysis: Henrick, Kubie, Brill, 
Alexander, Anna Freud, Aichhorn, Flugel, Men- 
ninger and Wittels, to name only a few. Each of the 
two editors prepared a chapter especially for the 
volume: Herma on the unconscious, and Kurth on 
unconscious factors in social prejudice. 


Horney, Karen. Neurosis and human growth. New 

York: Norton, 1950. Pp. 391. $3.75. 

In this newest version of her theory, Horney sees 
the neurotic as a person who is alienated from the 
actualization of his “rea! self,” and has diverted his 
energies to the building of a falsely idealized image. 
Compulsively he searches for glory, “to lift himself 
above others,” but inevitably failing to achieve an 
impossible goal, uses various neurotic devices to com- 
promise his failure, Psychotherapy strips away the 
overidealized self through a disillusioning process, 
giving “the constructive forces of the real self” a 
chance to grow. Horney’s new stand is closer to that 
of Adler, but it is also very much closer to sheer 
mystification. It also has the serious fault of implying 
that the compulsive competitiveness of the neurotic 
is a universal human quality, rather than one bound 
to a particular cultural pattern. Horney’s neurotic 
rings quite true in Manhattan, but he is almost in- 
conceivable in Bali. Her earlier and simpler 
formulations will have a wider influence. 


LorAnD, SANDOR. Clinical studies in psychoanalysis. 
New York: International Universities Press, 1950. 
Pp. 272. $4.00. 


A series of eighteen essays, fifteen from the 
author’s published papers and three from his earlier 
book, The Morbid Personality. Each paper centers 
on one or more case studies, which are interpreted 
in terms of orthodox Freudian psychoanalysis. 


MyYKk.Lesust, Hetmer R. Your deaf child: a guide 
for parents. Springfield, Ill.: Charles C Thomas, 
ix + 880. $10.00. 


Twenty-four chapters, each written by a specialist, 
are “designed to meet the interest of the general 
public in abnormal psychology, as well as the re- 
quirements of the student for a reference and source 
book.” Bibliographical references accompany each 
chapter. The coverage is good, style and exposition 
are adapted to the intended audience, and the treat- 
ment is adequate at the desired level. It should be 
useful for its purpose—W. A. H. 


MYKLesust, Hermer R. Your deaf child: a guide 
for parents. Springfield, Ill.: Charles C Thomas, 
1950. Pp. xv + 133. $2.50. 


An exceptionally successful combination of sound 
content, warm appeal, and excellent style make this 
volume an enviable example of desirable publication 
on the clinical evaluation and treatment of the 
handicapped. Clear and simple in exposition, well- 
rounded in its whole-child concept, broad yet con- 
cise, understandable as well as professionally im- 
peccable—these are restrained commendations. The 
author’s pattern and scope in so difficult a field as 
the deaf suggest the prospect of similar books for 
all types of handicaps. Of special import is the 
emphasis on the child as a person within and beyond 
his disability. Parents will be specially grateful for 
this insight; clinical workers may profit even more 
from the emphasis on sound attitudes, dynamic needs, 
and maturational learning.—E. A. D. 


RAPAporT, Davip. Emotions and memory. (2nd Un- 
altered Ed.) New York: International Universities 
Press, 1950. Pp. xiii + 282. $4.00. 


This useful monograph of 1942, for some time out 
of print, has been made available by a new publisher. 
Although Rapaport in his preface justifies the issuing 
of an “unaltered” new edition, many will feel sorry 
that he did not revise it to include the more recent 
evidence. Many aspects of Rapaport’s viewpoint have 
been supported very directly by subsequent memory 
experiments, such as those of Wallen, and are related 
clearly to the studies of attitude and perception, in- 
cluding those of Bruner, Postman and others. In 
fact, the eight years have dealt kindly with the 
volume, and its significance to today’s psychology is 
enhanced rather than diminished. It remains an in- 
dispensable reference to those interested in the 
relationsips between experimental and clinical find- 
ings. 


170 NEW BOOKS AND TESTS 


ScuHiLper, PauL. The image and appearance of the 
human body. New York: International Universi- 
ties Press, 1950. Pp. 353. $4.50. 

This volume is a reissue of Schilder’s well-known 
work, first published in London in 1935, which has 
been out of print for some time. Its republication is 
timely in view of current interest in the body image 
as a psychological concept, and its recent applications 
to personality theory, projective techniques, and psy- 
chotherapy. 


SHore, Maurice J., et al. Twentieth century mental 
hygiene. New York: Social Sciences, 1950. Pp. 444. 
$6.00. 


In this small volume are presented the rich and 
diversified fields of mental hygiene, both past and 
present. Nineteen authors discuss the contributions of 
the various disciplines to mental hygiene and the 
solutions it offers to various social ills. There are 
brief resumes of the development of mental hygiene 
in various countries. Throughout the book constant 
attention is paid to the problems still to be met. The 
chapters are short, and vary in the skill of the 
authors to compress into a very limited space vital 
information in their area. The book as a whole 
should prove most interesting and stimulating to 
those who wish a quick overview of present develop- 
ments in mental health.—B. M. L. 


Sirrerp, Carvin S. Residence hall counseling. 
Bloomington, Ill.: McKnight & McKnight, 1950. 
Pp. xviii + 238. $3.00. 

A practical volume on the problems of residence 
hall counseling, developed from the program of the 
University of Illinois. Intended as a guide for coun- 
selors with limited preparation, it describes in a 
detailed but simple way the organization and oper- 
ation of a guidance program at the grass-roots level. 
There is a good chapter of case studies. 


The mental health program of the forty-eight states. 
1313 E. 60th St., Chicago, Ill.: The Council of 
State Governments, 1950. Pp. x + 377. $4.00. 

In 1949 the Governor’s Conference directed the 
Council of State Governments to make a factual 
study of the mental hygiene activities and facilities 
of the states. The report is a valuable handbook on 
the history and background of mental hygiene activi- 
ties, and of the present status of organization, fi- 
nance, personnel, and treatment. Appendices include 
120 pages of tables. The needs for clinical psy- 
chologists expressed in the summary recommendations 
are in some contrast to the number actually in service 
in state hospitals and clinics. 


TESTS 


Mooney, Ross L., anp Gorpon, Leonarp V. Mooney 
Problem Check Lists. Grades 7-9, 10-12, college, 
and adult. 4 forms, one for each level. Untimed 
(25-50) min. Questionnaire blanks ($1.50 per 25, 
$5.00 per 100) with manual, pp. 15, for school 
forms, or manual, pp. 4, for adult form; specimen 
set, one form (35¢). New York: Psychological 
Corp., 1950. 


The Problem Check Lists are not intended to be 
used as tests, but are designed to help people tell 
about their problems. Each blank consists of a list of 
items, varying in number from 210 to 330 on the 
several forms, that represent areas in which people 
have difficulties. The original lists were compiled 
from large numbers of free responses, from case re- 
cords, and from reviews of the literature on student 
problems. The 1950 revisions have eliminated items 
infrequently checked, retained items having retest 
stability, and improved the grade placement of some 
statements. The manuals discuss the uses of the 
check lists as aids to counseling and to group guid- 
ance programs, and as tools for surveys and research. 


Tuurstone, L. L. Thurstone Temperament Sched- 
ule. High school-college-adult. 1 form. Untimed 
(20 min. Questionnaire blank (48¢) ; self-scoring 
answer pad ($1.75 per 25); IBM answer sheet 
($2.90 per 100); with manual, pp. 11; specimen 
set (75¢). Chicago: Science Research Associates, 
1949, 1950. 


A distinctive and perhaps valuable feature of this 
questionnaire is its emphasis on normality. There are 
no items describing neurotic symptoms, and the seven 
scores that are obtained from it are all designated 
by trait terms descriptive of ways in which normal 
people may vary—active, vigorous, impulsive, domi- 
nant, stable, sociable, and reflective. The seven clus- 
ters in which the 140 items are arranged were ob- 
tained from factor analyses by Thurstone and Guil- 
ford. The actual items were selected for internal 
consistency within each cluster. The reliabilities are 
moderate, .48 to .77 for the adult male group, the 
intercorrelations between trait scores are generally 
low, and the norms are based on adequately large 
groups. Despite all these statistical virtues, the inter- 
pretation of the schedule is almost surely subject to 
abuses. Many users will be tempted to follow the in- 
triguing suggestions for using it in vocational guid- 
ance, without paying enough attention to the cau- 
tious footnote that calls attention to the need for valid- 
ation. 


. 
Pa 














PSYCHOLOGICAL MONOGRAPHS: GENERAL 
AND APPLIED 


Voiume 64, 1950 
Patterns of Personality Rigidity and Some of Their Determinants. Seymour Fisher 
Elgia State Hospital. $307, $1.00 


The Value of an Oral Reading Test for Diagnosis of the Reading Difficulties of 
College Freshmen of Low Academic Performance. Charles A. Wells, American 


International College. $308, $1.00 


Rorschach Responses Related to Vocational Interests and Job Satisfaction. Solis 
L. Kates, Michigan State College. $309, $1.00 


Symbol Elaboration Test (S.E.T.): The Reliability and Validity of a New Projec- 
tive Technique. Johanna Krout, Chicago Psychologica! Institute. $310, $2.00 


Changes in Responses to the Minnesota Multiphasic Inventory Following Certain 
Therapies. William Schofield, University of Minnesota. {311, $1.00 


A Scale for Measuring Teacher-Pupil Attitudes and Teacher-Pupil Rapport. Carrol! 
H, Leeds, Fuzman University. £312, $1.00 


The Nature and Efficacy of Methods of Attack on Reasoning Problems. Benjamin 
Burack, Roosevelt College, $313, $1.00 


The Validity of a Multiple-Choice Projective Test in Psychopathological Screening. 
Martin Singer, VA Hospital, Long Island. $314, $1.00 


A Normative Study of the Thematic Apperception Test. Leonard D. Eron, Yale 
University. $315, $1.50 


Experimentally Induced Variations in Rorschach Performance. Edith E. Lord, 
Arizona State Department of Health. ¢316, $1.00 


An Evaluation of Personality-Trait Ratings Obtained by Unstructured Assessment 
Interviews. Ernest C. Tupes, United States Air Force. $317, $1.00 


American Psychological Association 


1515 Massachusetts Avenue N.W., Washington 5, D.C. 


































84 authorities wrote this definitive book... 
HANDBOOK of EXPERIMENTAL PSYCHOLOGY 


Edited by S. S. Srevens, Harvard University. Thirty-four 
leaders in the fields of physiology, psychology, medicine, and 
physics contributed nearly one million words to this new book, 
making it the most complete study of experimental psychology 
available. (The topics on the right are only a partial list of the 
fields covered in the book.) 


Handbook of Experimental Psychology is a unique book, since it 
bridges the gap between elementary textbooks and the specialized 
journals. It gives an up-to-date, more complete review of the 
fields covered in Carl Murchison’s Handbook of General Ex- 
perimental Psychology, now out of print. Ready in 4; 1951. 
1436 pages. Prob. $15.00. 


THE ORGANIZATION OF BEHAVIOR 


“New . . . Different ... A Neuropsychological Theory. By D. O. Hess, McGile 
Exciting . . . University. “Hebb’s Organization of Behavior is an out- 
standing book and one of the most important psychological 
“One of books of the past 50 years. It is full of ideas — most of 
‘ oo mat ol them good and al! of them stimulating—-and it brings to- 
important psychological gether many previously neglected researches into a new 
books of the past framework and suggests new solutions to a gamut of prob- 
50 years.” lems in ynany fields of psychology. The book is new, differ- 
ent, and exciting, and very stimulating. I believe every 
psychologist . . . should read it,” 
—C. T. Morcan 
Chairman, Department of Psychology 
The J Hopkins University 
1950 335 pages . 


THE STRUCTURE OF HUMAN ABILITIES 


“A first-class By Pup E. Verno niversity of London. 
piece of work . .. “The book is, in my judgment, a first-class piece of work. 
Vernon does not so much bring to bear new statistical 
“A definitive methods in his effort to advance our knowledge of the 
systematizati structure of human abilities as he does bring together and 
i re a ww” synthesize in an amazingly capable fashion a wide range of 
oe relevant English and erican researches. 
“Vernon’s volume shou!d best be regarded as a definitive 
systematization of all we know currently about abilities 
from a statistical analysis of tests.” 
—Dr. Srantey G. Estes 
Harvard University 


1951 151 pages. $2.7 


Send for copies on approval 


JOHN WILEY & SONS, INc. 440 4th Ave., New York 16, N. Y. 














” 





